1 Introduction

Climate change and air quality are influenced by fluxes of green house gases, reactive gas emissions and aerosols. The temporal evolution of reactive chemistry in the atmosphere is usually modeled by atmospheric chemistry transport models. Poorly known initial values, sources and sinks cause a serious problem for the quality of the simulation addressed by data assimilation and inverse modeling (e.g. Sandu and Chai (2011)). It is a typical situation in practice data assimilation that the number of observations are markedly lower than the model degree of freedom (see Daley (1991)). In order to improve the quality of data assimilation and inverse modeling results, several aspects can be considered.

Firstly, the observation network can be optimized optimization problem subject to given external constraints, which has been addressed traditionally by Observation System Simulation Experiments [OSSEs, e.g. Daley (1991)]. The advanced concept of targeted observations has been popularized during the FASTEX campaign [e.g. Langland et al. (1999), Szunyogh et al. (1999)]. Theoretical studies are presented, for example by Berliner et al. (1999), or recently by Bellsky et al. (2014) for a case of study of highly nonlinear dynamics and Wu et al. (2016) for the optimal deployment of observations for time-varying system in a infinite dimensional domain within a finite-time interval. Secondly, the problem addressing the benefit assessment of individual observations or types of measurements has been investigated by Cardinali et al. (2004) and a sequence of related papers. Thirdly, the need to quantify the information provided by the observations can be satisfied by suitably selected measurements. Singular value decomposition (SVD) is a well-known tool applied to identify the priorities of observations by detecting the fastest growing uncertainties in meteorological models (e.g. Bousserez and Henze (2018), Buizza and Palmer (1995), Kang and Xu (2012), Khattatov et al. (1999), Liao et al. (2006), Lorenz (1965), Sandu et al. (2013), Singh et al. (2013), Spantini et al. (2015)). Besides, due to the importance of the sensitivity analysis of the model evolution with respect to the model errors and the observation network, the concept of the degree of freedom for signal (DFS) is also frequently applied to satellite retrieval problems, typically of significantly lower dimensions when compared to data assimilation [see for example Eyre (1990), Fisher (2003), Martynenko et al. (2010), Rabier et al. (2002), Rodgers (2000)]. Several methodologies have been proposed to account for model and estimation errors in both variational and ensemble data assimilation [e.g. Bellsky et al. (2014), Daescu and Navon (2004), Gautam et al. (2021), Gautam et al. (2021), Li et al. (2009), Sharma et al. (2014), Smith et al. (2013), Zupanski et al. (2007)]. Navon (1997) outlined the perceptibility and stability in optimal parameter estimation in meteorology and oceanography. Cioaca and Sandu (2014b) introduced a general framework to optimize a set of parameters controlling the 4D-var data assimilation system and it was applied into shallow water model state and other parameter in Cioaca and Sandu (2014a).

Most studies cited above were based on the classical data assimilation problem, of which the initial value or the prognostic state variable is the only parameter to be optimized. However, for chemistry transport or greenhouse gas models driven by emissions in the troposphere, the optimization of emission rates is at least as important as the initial value in the data assimilation. In order to get better analysis from combining the model with observations, efforts of joint optimization have been made by adding emission rates to concentrations in amount of manuscripts, such as Bocquet and Sakov (2013), Elbern and Schmidt (1999), Elbern et al. (2000), Elbern et al. (2007), Goris and Elbern (2013), Goris and Elbern (2015), Miyazaki et al. (2012), Winiarek et al. (2014). However, the lack of ability to observe and estimate surface emission fluxes directly is still a major roadblock hampering the progress in predictive skills of climate and atmospheric chemistry models. For example studies with focus on urban area emissions with interactions of longer range transport need to discriminate between initial values driven by external emissions, distant emissions like biomass burning, and local emissions [e.g. Duarte et al. (2021), Kaskaoutis et al. (2014), Kumar et al. (2019)]. In all cases, the question of model and emission uncertainty is crucial, as it blurs the model based capacities to discriminate between initial value and emission rate controlled simulations. This is especially challenging in case of biogenic emissions, which require related knowledge on plant and soil properties [e.g. Vogel and Elbern (2021a), Vogel and Elbern (2021b)].

Therefore, in this paper we novelly establish the dynamic model of emission rates and extended the traditional chemistry transport model. It provide the initial value and emission rates the same importance in the optimization. Based on the extended model, we investigate an approach to identify and assess the potential observability and sensitivity of any given observation configuration to the initial value and emission rates individually for atmospheric transport diffusion models by considering the (ensemble) Kalman smoother as underlying data assimilation method. Further, through the sensitivities of the initial value and emission rates, we have the opportunity to quantitatively balance weights between the initial value and emission rates. Those can help us determining the sensitive parameters and quantitatively build the initial variance matrices for both concentrations and emission rates in advance of the data assimilation process so that the computation resource can be saved and the accuracy of optimization can be improved.

The rest of paper is organized as follows. In Sect. 2, we establish the dynamic model for emissions and extend the original atmospheric transport model by emission rates in a novel way. In Sect. 3, based on the Kalman smoother, we present the specific approach to determine the degree of freedom for signal for both initial values and emission rates. In Sect. 4, we develop the ensemble approach to evaluate the degree of freedom for signal of the initial value and emission rates based on the non-singularity of the background covariance matrix. In Sect. 5, we identify the sensitive directions of the initial values and emission rates separately through maximizing the ratio of magnitudes of observation perturbation and the initial perturbation. It also provides us a possibility to estimate sensitivities of the initial value and emission rates by few leading singular values and singular vectors. In Sect. 6, we extend a 3D advection-diffusion equation with the dynamic model of the emission rate and give several elementary experiments to verify and demonstrate the ensemble approach. Finally, in Sect. 7, we conclude main contributions of this paper and discuss possible extensions.

2 Atmospheric inverse modeling extended by emission rates

We usually describe the concentration change rate by the following prognostic atmospheric transport model

$$\begin{aligned} \dfrac{dc(t)}{dt}={\mathcal {A}}(c)+e(t), \end{aligned}$$
(1)

where \({\mathcal {A}}\) is a nonlinear model operator, c(t) and e(t) are the state vector of chemical constituents and emission rates at time t, respectively.

The prior estimate of the state vector of concentrations c(t) is given and denoted by \(c_b(t)\), termed as the background state. The prior estimate of emission rates, usually taken from emission inventories, is denoted by \(e_b(t)\).

Let \({\mathbf {A}}\) be the tangent linear operator of \({\mathcal {A}}\), \(\delta c(t_0)=c(t_0)-c_{b}(t_0)\) and \(\delta e(t)=e(t)-e_{b}(t)\). The linear evolution of the perturbation of c(t) follows the tangent linear model as

$$\begin{aligned} \frac{d\delta c}{dt}={\mathbf {A}}\delta c+\delta e(t). \end{aligned}$$
(2)

By the discretization of the tangent linear model in space, it is straightforward to obtain the linear solution of (2) discretized in space and continuous in time as

$$\begin{aligned} \delta c(t)=M(t,t_0)\delta c(t_0)+\int _{t_0}^t M(t, s)\delta e(s)ds, \end{aligned}$$
(3)

where \(M(\cdot ,\cdot )\) is the resolvent obtained from the spatial discretization of \({\mathbf {A}}\). Without loss of generality, we assume \(\delta c(t)\in {\mathbf {R}}^{n}\), \(\delta e(t)\in {\mathbf {R}}^{n}\), where n is the dimension of the partial phase space of concentrations and emission rates. Obviously, \(M(\cdot ,\cdot )\in {\mathbf {R}}^{n\times n}\).

In addition, let y(t) be the observation vector of c(t) and define

$$\begin{aligned} \delta y(t)=y(t)-{\mathcal {H}}(c_b)(t), \end{aligned}$$
(4)

where \(\delta y(t)\in {\mathbf {R}}^{m(t)}\), m(t) is the dimension of the phase space of observation configurations at time t. \({\mathcal {H}}(t)\) is a nonlinear forward observation operator mapping the model space to the observation space. Linearizing the nonlinear operator \({\mathcal {H}}\) as H, we present the observation system as

$$\begin{aligned} \delta y(t)=H(t)\delta c(t)+\nu (t), \end{aligned}$$
(5)

where the observation error \(\nu (t)\) of the Gaussian distribution has zero mean and variance \(R(t)\in {\mathbf {R}}^{m(t)\times m(t)}\).

The Kalman smoother is a recursive estimator to provide the best linear unbiased estimates (BLUE) of the unknown variables with error estimates, using a sequence of observations [e.g. Gelb (1974)]. In addition to 4D-Var approaches, Kalman smoothers not only can provide the best linear unbiased estimate by a series of observations over time for the state vector, but also update the forecasting error covariances of that estimate.

It is clear to see that if the initial state of concentrations is the only parameter to be optimized, we can only consider the concentrations as the model states and apply the Kalman filter and smoother into the tangent linear model (2) with observations (5) directly. However, in most cases the exact values of emission rates are poorly known and also considered as parameters which need to be optimized. It has been shown by Elbern et al. (2007) that the diurnal profiles are better known than the exact amplitude of emission rates. Hence, we can only consider the amplitude of the diurnal emission cycle as optimization parameters. Thus, we first reformulate the background evolution of emission rates from time s to t in a dynamic form as an emission model

$$\begin{aligned} e_{b}(t)=M_e(t,s)e_{b}(s), \quad s\leqslant t, \end{aligned}$$
(6)

where \(e_b(\cdot )\) is a n-dimensional vector, the \(i^{th}\) element of \(e_b(\cdot )\) is denoted by \(e_b^i(\cdot )\) and \(M_e(t,s)\) is the scaling diagonal matrix defined as

$$\begin{aligned} M_e(t,s)=\left( \begin{array}{cccc} \frac{e_b^1(t)}{e_b^1(s)}&{} &{} &{}\\ &{} \frac{e_b^2(t)}{e_b^2(s)} &{} &{} \\ &{} &{} \ddots &{}\\ &{} &{} &{} \frac{e_b^n(t)}{e_b^n(s)}\\ \end{array} \right) . \end{aligned}$$
(7)

We assume that the amplitude of emission rates is the only parameter to be optimized and then establish the dynamic model of emission rates subject to the above background evolution

$$\begin{aligned} \delta e(t)=M_e(t,s)\delta e(s), \quad s\leqslant t. \end{aligned}$$
(8)

Several studies [e.g. Gelb (1974)] stated that the estimation of the variable x by the fix-interval Kalman smoother generally equals to the conditional expectation based on the observations within the entire time interval, denoted by \({\mathbf {E}}[x\vert \{y(t_{obs}), t_{obs}\in [t_0,t_N]\}]\). With the emission model (8), the estimate of e(t) by Kalman smoother on \([t_0,t_N]\) follows the linear property of the conditional expectation,

$$\begin{aligned}&{\mathbf {E}}[e(t)\vert \{y(t_{obs}), t_{obs}\in [t_0,t_N]\}]\nonumber \\= & {} {\mathbf {E}}[M_e(t,s)e(s)\vert \{y(t_{obs}),t_{obs}\in [t_0,t_N]\}]\nonumber \\= & {} M_e(t,s){\mathbf {E}}[e(s)\vert \{y(t_{obs}), t_{obs}\in [t_0,t_N]\}]. \end{aligned}$$
(9)

It implies that BLUEs of emission rates with the dynamic model (8) preserve the same diurnal profiles of the background of emission rates.

By rewriting (3) as \(\delta c(t)=M(t,t_0)\delta c(t_0)+\int _{t_0}^t M(t, s)M_e(s,t_0)\delta e(t_0)ds,\) we obtain the transport model with the state vector extended by emission rates

$$\begin{aligned} \left( \begin{array}{c} \delta c(t)\\ \delta e(t) \end{array} \right) = \left( \begin{array}{cc} M(t,t_0) &{} \int _{t_0}^t M(t, s)M_e(s,t_0)ds\\ 0 &{} M_e(t,t_0) \end{array} \right) \left( \begin{array}{c} \delta c(t_0)\\ \delta e(t_0) \end{array} \right) . \end{aligned}$$
(10)

Typically, there is no direct observation for emissions, apart from the flux tower observations used for carbon dioxide, which are not considered here. Therefore, we can reformulate the observation mapping as

$$\begin{aligned} \delta y(t)=(H(t), 0_{n\times n})\left( \begin{array}{c} \delta c(t)\\ \delta e(t) \end{array} \right) +\nu (t), \end{aligned}$$
(11)

where \(0_{n\times n}\) is a \(n\times n\) matrix with zero elements.

It is now clear that both concentrations and emission rates are included in the state vector of the homogeneous model (10). It allows us to apply the Kalman smoother in a fixed time interval \([t_0,t_N]\) in order to optimize both parameters. Besides, a more general case of the transport model extended by emission is shown in Appendix A.

3 The degree of freedom for signal of concentrations and emissions

In this section we will introduce the theoretical approach to determine the DFS of concentrations and emissions, resting on the extended model in Sect. 2. This approach gives us access to determine the potential ability of observations to optimize each variable of the above extended model, based on the Kalman smoother within a finite-time interval.

For convenience, we generalize the atmospheric transport model (10) by the following discrete-time linear system on the time interval \([t_0,t_1,\cdots , t_N]\) as

$$\begin{aligned}&x(t_{k+1})=M(t_{k+1},t_k)x(t_k)+\varepsilon (t_k), \end{aligned}$$
(12)
$$\begin{aligned}&y(t_k)=H(t_k)x(t_k)+\nu (t_k), \end{aligned}$$
(13)

where \(x(\cdot )\in {\mathbf {R}}^n\) is the state variable and \(y(t_k)\in {\mathbf {R}}^{m(t_k)}\) is the observation vector at time \(t_k\). The model error \(\varepsilon (t_k)\) and the observation error \(\nu (t_k)\), \(k=1,\cdots , N\) of Gaussian distributions have zero means. The model error covariance matrix is denoted by \(Q(t_k)\) and the observation error covariance matrix is denoted by \(R(t_k)\).

According to Appendix B, applying the singular value decomposition into \(P^{\frac{1}{2}}(t_0\vert t_{-1}){\mathcal {G}}^{\top }{\mathcal {R}}^{-\frac{1}{2}}=VS U^{\top },\) we obtain

$$\begin{aligned} {\tilde{P}}=P^{-\frac{1}{2}}(t_0\vert t_{-1})(P(t_0\vert t_{-1})-P(t_0\vert t_N))P^{-\frac{1}{2}}(t_0\vert t_{-1})=\sum _{i=1}^{r}\dfrac{s_i^2}{1+s_i^2}v_{i}v_{i}^{\top }, \end{aligned}$$
(14)

where \(v_{i}\) is the \(i^{th}\) left singular vector in V related to the singular value \(s_i\), which is the \(i^{th}\) element on the diagonal of S.

It is clear that the trace of \({\tilde{P}}\) can be used to evaluate the total improvements of model states. Thus, the nuclear norm is appropriately taken as the metric, which is defined as

$$\begin{aligned} \Vert A \Vert _{1}=\text {tr}(\sqrt{A^{\top } A}), \end{aligned}$$
(15)

where A is any matrix and \(\text{ tr }(\cdot )\) denotes the trace of the matrix.

From (14), we obtain

$$\begin{aligned} \Vert {\tilde{P}}\Vert _1= \text {tr}({\tilde{P}})=\sum _{i=1}^{r}\dfrac{s_i^2}{1+s_i^2}. \end{aligned}$$
(16)

This is well-known as the degree of freedom for signal (DFS) of the model (e.g. Rodgers (2000)).

It is obvious that \(\Vert {\tilde{P}}\Vert _1<\Vert I\Vert _1=n\). Here n can be considered as the total relative improvement if the system is definitely observed. Thus, if we consider the ratio

$$\begin{aligned} {\tilde{p}}=\dfrac{\Vert {\tilde{P}}\Vert _1}{\Vert I\Vert _1}=\dfrac{\Vert {\tilde{P}}\Vert _1}{n}\in [0,1), \end{aligned}$$
(17)

the percentage of the total improvement of the model is obtained, which is henceforth called the relative degree of freedom for signal.

In order to get a deeper insight into the potential capacity of the observation network to improve the estimation of all model states, we consider the corresponding value in the diagonal of \({\tilde{P}}\) as the contribution of the degree of freedom for signal. We denote the \(j^{th}\) element on the diagonal of \({\tilde{P}}\) by \({\tilde{P}}_{j}\). From (47), the contribution of the \(j^{th}\) element of \(x(t_0)\) to the degree of freedom for signal can be expressed as

$$\begin{aligned} {\tilde{P}}_j=\sum _{i=1}^{r}\dfrac{s_i^2}{1+s_i^2}(v_{ij})^2, \end{aligned}$$
(18)

where \(v_{ij}\) is the \(j^{th}\) element of \(v_i\).

Besides, we can see that (14) enables us to discriminate the DFS contributed to different optimization parameters, which are here emission rates and the initial value. Without loss of generality, we divide (14) into the following block matrix according to the dimension of c and e

$$\begin{aligned} {\tilde{P}} =\left( \begin{array}{cc} {\tilde{P}}^c &{} {\tilde{P}}^{ce} \\ {\tilde{P}}^{ec} &{} {\tilde{P}}^e \end{array}\right) = \sum _{i=1}^{2n}\dfrac{s_i^2}{1+s_i^2}\left( \begin{array}{c} v_{i}^c\\ v_{i}^e \end{array}\right) (v_{i}^{{c}^{\top }}, v_{i}^{{e}^{\top }})\in {\mathbf {R}}^{2n\times 2n}, \end{aligned}$$
(19)

where \((v_i^{c^{\top }},v_i^{e^{\top }})^{\top }=v_i\).

It is easy to see that

$$\begin{aligned} {\tilde{P}}^c= \sum _{i=1}^{2n}\dfrac{s_i^2}{1+s_i^2} v_{i}^cv_{i}^{{c}^{\top }}, \quad {\tilde{P}}^e= \sum _{i=1}^{2n}\dfrac{s_i^2}{1+s_i^2} v_{i}^ev_{i}^{{e}^{\top }}. \end{aligned}$$
(20)

Further, the degree of freedom for signal of \(j^{th}\) element in \(c(t_0)\) and \(e(t_0)\) are given by

$$\begin{aligned} {\tilde{P}}_j^c= \sum _{i=1}^{2n}\dfrac{s_i^2}{1+s_i^2} (v_{ij}^{c})^2, \quad {\tilde{P}}_j^e= \sum _{i=1}^{2n}\dfrac{s_i^2}{1+s_i^2} (v_{ij}^{e})^2, \end{aligned}$$
(21)

where \(v_{ij}^{c}\) and \(v_{ij}^{e}\) are the \(j^{th}\) elements of \(v_i^c\) and \(v_i^e\), respectively.

Moreover, the degree of freedom for signal of the concentration \(\Vert {\tilde{P}}^c \Vert _{1}\) and emission rates \(\Vert {\tilde{P}}^e \Vert _{1}\) are calculated by

$$\begin{aligned} \Vert {\tilde{P}}^c \Vert _{1}= \sum _{i=1}^{2n}\dfrac{s_i^2}{1+s_i^2} \text {tr}(v_{i}^cv_{i}^{{c}^{\top }}), \quad \Vert {\tilde{P}}^e \Vert _{1}= \sum _{i=1}^{2n}\frac{s_i^2}{1+s_i^2} \text {tr}(v_{i}^ev_{i}^{{e}^{\top }}). \end{aligned}$$
(22)

It is worth noticing that

$$\begin{aligned}&{\tilde{P}}^c=(P^{c}(t_0\vert t_{-1}))^{-\frac{1}{2}}(P^{c}(t_0\vert t_{-1})-P^c(t_0\vert t_N))(P^{c}(t_0\vert t_{-1}))^{-\frac{1}{2}} \end{aligned}$$
(23)
$$\begin{aligned}&{\tilde{P}}^e=(P^{e}(t_0\vert t_{-1}))^{-\frac{1}{2}}(P^{e}(t_0\vert t_{-1})-P^e(t_0\vert t_N))(P^{e}(t_0\vert t_{-1}))^{-\frac{1}{2}} \end{aligned}$$
(24)

if and only if there is no prior correlation between the initial concentration and emission rates. In this case \(P^{ce}(t_0\vert t_{-1})=0_{n\times n}\), the corresponding relative degrees of freedom for signal of the concentration and emission rates are defined as

$$\begin{aligned} {\tilde{p}}^{c}=\frac{\Vert {\tilde{P}}^{c}\Vert _1}{n}, \quad {\tilde{p}}^{e}=\frac{\Vert {\tilde{P}}^{e}\Vert _1}{n}. \end{aligned}$$
(25)

From (17), \({\tilde{p}}^{c}\in [0,1)\) and \({\tilde{p}}^{e}\in [0,1)\) seem like percentages of the relative improvements of concentration and emission rates, respectively. However, efficient observation networks ideally lead to values are close to 1 for both of them, such that

$$\begin{aligned} \frac{\Vert {\tilde{P}}^c\Vert _1}{n}+\frac{\Vert {\tilde{P}}^e\Vert _1}{n}>1. \end{aligned}$$
(26)

It results from the reason that the normalization of \({\tilde{P}}\) is only with respect to the extended covariance matrix \(P(t_0|t_{-1})\) rather than specified to the covariance matrices \({\tilde{P}}^c(t_0|t_{-1})\) and \({\tilde{P}}^e(t_0|t_{-1})\) individually. The relative degree of freedom for signal cannot serve our objective to distinguish the observability of the concentration and emission rates. By observing the block form of \({\tilde{P}}\), we have

$$\begin{aligned} \Vert {\tilde{P}}^c \Vert _{1}+\Vert {\tilde{P}}^e \Vert _{1}=\Vert {\tilde{P}} \Vert _{1}. \end{aligned}$$
(27)

Thus, in order to compare the potential improvements of the concentration and emission rates separately, we define a relative ratio of the degree of freedom for signal for the concentration or emission rates as

$$\begin{aligned} {\tilde{p}}^c=\frac{\Vert {\tilde{P}}^c\Vert _1}{\Vert {\tilde{P}} \Vert _{1}}, \quad {\tilde{p}}^e=\frac{\Vert {\tilde{P}}^e\Vert _1}{\Vert {\tilde{P}} \Vert _{1}}, \quad {\tilde{p}}^e+{\tilde{p}}^c\equiv 1. \end{aligned}$$
(28)

If the degree or relative degree of freedom for signal of the observation network within the assimilation window is almost zero, an improvement cannot be expected. In contrast, \(\{{\tilde{P}}_j^c\}_{j=1}^{n}\) and \(\{{\tilde{P}}_j^e\}_{j=1}^{n}\), which show the improvement of each parameter j of concentrations and emission rates respectively, can help us determining which parameters can be expected to be optimized by the existing observation configurations. Furthermore, comparing \({\tilde{p}}^c\) with \({\tilde{p}}^e\), we can conclude that the estimate of the one with the larger relative ratio of freedom for signal can be improved more efficiently by the existing observation configurations than the other. In other words, if \({\tilde{p}}^c>{\tilde{p}}^e\), the existing observation configuration is more sensitive to the initial values of concentrations. Conversely, if \({\tilde{p}}^c <{\tilde{p}}^e\), the observation configurations can improve the estimate of emission rates better. According to \({\tilde{p}}^c\) and \({\tilde{p}}^e\), the relative weights between the concentration and emission rates can be identified quantitatively. In a data assimilation context, where observations are in a weighted relation to the background, the BLUE favors those parameters with higher observation efficiency.

The special case that \({\tilde{p}}^e\) is very close to zero implies that observation network is not detectable for the emission-rate optimization.

4 The ensemble approach to determine the DFS

The ensemble Kalman smoother (EnKS) is a frequently applied tool for problems with a large number of control variables in the field of data assimilation [e.g. Anderson (2001), Evensen (2009)]. In this section, in order to identify the potential capacities of observation networks to optimize the concentration and especially the poorly known emission rates for high-dimensional problems, we will introduce the ensemble-based version of the approach in Sect. 3.

According to Appendix C, analog to Sect. 3, we have

$$\begin{aligned} {\bar{P}} &= {\bar{P}}^{\dag \frac{1}{2}}(t_0\vert t_{-1})({\bar{P}}(t_0\vert t_{-1})-{\bar{P}}(t_0\vert t_N)){\bar{P}}^{\dag \frac{1}{2}}(t_0\vert t_{-1})\nonumber \\&= \sum _{i=1}^{r}\dfrac{{\bar{s}}_i^2}{1+{\bar{s}}_i^2}{\bar{v}}_{i}{\bar{v}} _{i}^{\top }. \end{aligned}$$
(29)

Similar to Sect. 3, we can also divide \({\bar{P}}\) into the block form according to the dimensions of the concentration and emission rates. Correspondingly, we obtain the ensemble degree of freedom for signal of \(j^{th}\) element in \(c(t_0)\) and \(e(t_0)\)

$$\begin{aligned} {\bar{P}}_j^c= \sum _{i=1}^{r}\dfrac{{\bar{s}}_i^2}{1+{\bar{s}}_i^2} ({\bar{v}}_{ij}^{c})^2, \quad {\bar{P}}_j^e= \sum _{i=1}^{r}\dfrac{{\bar{s}}_i^2}{1+{\bar{s}}_i^2} ({\bar{v}}_{ij}^{e})^2, \end{aligned}$$
(30)

where \({\bar{v}}_{ij}^{c}\) and \({\bar{v}}_{ij}^{e}\) are the \(j^{th}\) elements of \({\bar{v}}_i^c\) and \({\bar{v}}_i^e\), respectively and

$$\begin{aligned} ({\bar{v}}_i^{c^{\top }},{\bar{v}}_i^{e^{\top }})^{\top }={\bar{v}}_i. \end{aligned}$$

We observe that (29) and (14) have a similar form. By virtue of

$$\begin{aligned} {\bar{P}}^{\dag \frac{1}{2}}(t_0\vert t_{-1}){\bar{P}}_{xy}^f\mathcal {{\bar{R}}}^{-\frac{1}{2}} ={\bar{P}}^{\frac{1}{2}}(t_0\vert t_{-1}){\mathcal {G}}^{\top } \mathcal {{\bar{R}}}^{-\frac{1}{2}}, \end{aligned}$$
(31)

we can find that the final results of (14) and (63) are equivalent. However, compared with \(P^{\frac{1}{2}}(t_0\vert t_{-1}){\mathcal {G}}^{\top } {\mathcal {R}}^{-\frac{1}{2}}\), the ensemble expression \({\bar{P}}^{\dag \frac{1}{2}}(t_0\vert t_{-1}){\bar{P}}_{xy}^f\mathcal {{\bar{R}}}^{-\frac{1}{2}}\) processes the absolute advantage that in the calculation of \({\bar{P}}_{xy}^f\) since we do not need the explicit form of \({\mathcal {G}}\). It allows us to code it line by line such that our approach is computationally more efficient.

Analog to Sect. 3, we can similarly define the ensemble degree of freedom for signal (EnDFS) as \(\Vert {\bar{P}}\Vert _1\) and consider each element on the diagonal of \({\bar{P}}\) as the contribution to EnDFS of the corresponding model state.

In practice, \({\bar{P}}(t_0|t_{-1})\) is typically singular, thus we have

$$\begin{aligned}&{\bar{P}}^{\dag \frac{1}{2}}(t_0\vert t_{-1})({\bar{P}}(t_0\vert t_{-1})-{\bar{P}}(t_0\vert t_N)){\bar{P}}^{\dag \frac{1}{2}}(t_0\vert t_{-1})\nonumber \\&= V_0{\hat{S}}_0^\dag V_0^{\top }(V_0{\hat{S}}_0^2V_0^{\top })V_0{\hat{S}}_0^\dag V_0^{\top }-{\bar{P}}^{\dag \frac{1}{2}}(t_0\vert t_{-1}){\bar{P}}(t_0\vert t_N){\bar{P}}^{\dag \frac{1}{2}}(t_0\vert t_{-1})\nonumber \\ &= V_0I_{r_0}V_0^{\top }-{\bar{P}}^{\dag \frac{1}{2}}(t_0\vert t_{-1}){\bar{P}}(t_0\vert t_N){\bar{P}}^{\dag \frac{1}{2}}(t_0\vert t_{-1}), \end{aligned}$$
(32)

where \(I_{r_0}\) is the diagonal matrix with the diagonal \(({\mathbf {1}}_{1\times r_0},0_{1\times (n-r_0)})\). It is clear from (60) that \({\bar{P}}^{\dag \frac{1}{2}}(t_0\vert t_{-1}){\bar{P}}(t_0\vert t_N){\bar{P}}^{\dag \frac{1}{2}}(t_0\vert t_{-1})\) is still a nonnegative definite matrix.

Thus, the ensemble relative degree of freedom for signal (EnRDFS) is defined by

$$\begin{aligned} {\bar{p}}=\dfrac{\Vert {\bar{P}}\Vert _1}{\Vert I_{r_0}\Vert _1}=\dfrac{\Vert {\bar{P}}\Vert _1}{r_0}\in [0,1). \end{aligned}$$
(33)

In order to distinguish the potential observabilities for the concentration and emission rates, the ensemble relative ratios of DFS remain

$$\begin{aligned} {\bar{p}}^c=\frac{\Vert {\bar{P}}^c\Vert _1}{\Vert {\bar{P}} \Vert _{1}}, \quad {\bar{p}}^e=\frac{\Vert {\bar{P}}^e\Vert _1}{\Vert {\bar{P}} \Vert _{1}}. \end{aligned}$$
(34)

5 The sensitivity of observation networks

The above discussion about DFS aims to evaluate the capacity of a predefined measurement network to optimize the initial value and emission rates simultaneously. In Appendix D, independent of any concrete data assimilation method, we use the singular vector approach [see Buizza and Montani (1999), Buizza and Palmer (1995), Liao et al. (2006) etc.]  to identify sensitive directions of observation networks to the initial value and emission rates separately and show the association with Sect. 3.

From Appendix D, we can see that the singular value \(s_k\) shows the amplification of the impact of the initial state to observation configurations in the entire time interval. The associated singular vector in the state space \(v_k\) is the direction of the \(\text {k}^{\text {th}}\) growth of the perturbation of observations evolving from the initial perturbation. With the special choice \(W_0=P^{-1}(t_0\vert t_{-1})\) and \({\mathcal {W}}={\mathcal {R}}^{-1}\), we compare the sensitivity analysis with the discussion in Sect. 3. It is clear that the vector \(v_k\) also points to the \(\text {k}^{\text {th}}\) direction which maximizes the relative improvement of estimates based on the Kalman smoother. It indicates that the states with higher contributions to DFS are the same with the states, which are more sensitive to the observation networks. Besides, the leading singular value \(s_1\) is related to the operator norm of \({\tilde{P}}\) as

$$\begin{aligned} \Vert {\tilde{P}}\Vert =\max _{\Vert x\Vert =1}\Vert {\tilde{P}}x\Vert =\frac{s_1^2}{1+s_1^2}, \end{aligned}$$
(35)

which implies the upper boundedness of \({\tilde{P}}\). It gives us an access to approximate and target sensitive parameters or areas with the metric of the leading singular vectors weighted by the corresponding singular values.

Moreover, due to the homogeneity of the atmospheric transport model state vector extended with emissions, the above sensitivity analysis can be easily applied by dividing singular vectors into the block form according to the dimensions of the initial state and emissions. The corresponding blocks of different singular vectors indicate the different sensitive directions of the initial state and emissions and allow for this relative quantification. Correspondingly, we can approximate and target parameters sensitive to the existing observation networks for both the initial value and emission rates, respectively.

6 Experiment

In this section, we apply the approaches in Sects. 4 and 5 into an elementary advection-diffusion model to show how to assess the potential observability of concentrations and emission rates through EnDFS. We can see how it helps to identify the sensitive parameters of both concentrations and emission rates to the given observations. We consider a linear advection-diffusion model with Dirichlet horizontal (lateral) boundary condition and Neumann lower (surface) boundary condition in the vertical direction on the domain \([0,14]\times [0,14]\times [0,4]\) as follows,

$$\begin{aligned} \dfrac{\partial \delta c}{\partial t}=-v_x\dfrac{\partial \delta c}{\partial x}-v_y\dfrac{\partial \delta c}{\partial y}+\dfrac{\partial }{\partial z}(K(z)\dfrac{\partial \delta c}{\partial z}) +\delta e, \end{aligned}$$
(36)

where \(\delta c\), \(\delta e\) are the perturbations of the concentration and the emission rates respectively. K(z) is a differentiable function of height z.

In this example, the vertical coupling of horizontal grid layers is accomplished only by a diffusion operator to avoid signal imprints following some arbitrarily designed small scale vertical advection patterns. This is considered as valid, since the information loss by diffusion induced reduction of the noise ratio due to the signal diffusion analogue is significantly stronger than in case of advection

We assume the velocities \(v_x=v_y=0.5\) and the time step \(\triangle t=0.5\) and the numerical solution is based on the symmetric operator splitting technique [see Yanenko (1971)] with the following operator sequence

$$\begin{aligned} \delta c(t+\triangle t)=T_xT_yD_zAD_zT_yT_x\delta c(t), \end{aligned}$$
(37)

where \(T_x\) and \(T_y\) are transport operators in horizontal directions x and y, \(D_z\) is the diffusion operator in the vertical direction z. The parameters of emission and deposition rates are included in A. The Lax-Wendroff algorithm is chosen as the discretization method for horizontal advection with \(\triangle x=\triangle y=1\). The vertical diffusion is discretized with \(\triangle z=1\) by Crank-Nicolson scheme with the Thomas algorithm [see Higham (2002)] as solver. The number of the grid points \(N_g=1125\).

With the same temporal and spatial discretization of the concentration, the background knowledge of the emission rates is given by \(e_b(t_n, i, j, l)\), \(n=1,\cdots , N\). We establish the discrete dynamic model of the emission rates according to (8)

$$\begin{aligned} \delta e(t_{n+1})=M_e(t_{n+1},t_n)\delta e(t_n), \quad n=1,\cdots , N, \end{aligned}$$
(38)

where \(M_e(t_{n+1},t_n)=e_b(t_{n+1})/ e_b(t_n).\)

In this section, we assume \(\delta d\) is a constant over time and the observation operator H(t) mapping the state space to the observation space is a \(1\times 2N_g\) time-invariant matrix. In Wu et al. (2016), the convergence of the numerical solution based on the above splitting and dicretization method to the original solution of the partial differential equation (36) has been proved.

In our simulations, we produce \(q=500\) (the ensemble number) samples for the initial concentration and emission rates respectively by pseudo independent random numbers and make the states correlated by the moving average technique. It has been tested that the computation cost of our approach is linearly increasing with the number of ensembles. In the following, we present three different tests, aiming to demonstrate roles of variable winds, emissions, and vertical diffusion.

Advection tests: The following part demonstrates the potential capacity and limits of the DFS analysis tool. The prototypical examples are designed to show the expected elementary outcomes of the following situations. They exhibit the effects of assimilation window length in relation to emission location. These include (i) an assimilation window too short to capture emission impacts at the observation site, (ii) an extended assimilation window with balanced signal of impacts of concentrations and emissions at the observation site, (iii) a further increased assimilation window featuring a declining impact of initial values and growing emission impact. The first elementary advection test (Figs. 1, 2, 3, 4, 5, 6, and 7) identifies the sensitivities of parameters subject to different wind direction and data assimilation window through the EnDFS of each element of the concentrations and emission rates. Focusing on the advection effects, we apply the model with a weak diffusion process \((K(z)=0.5e^{-z^2})\).

Fig. 1
figure 1

Advection test with \(10\triangle t\) data assimilation window (DAW) and southwesterly wind. The contributions from the concentration and emission to EnDFS are shown in the left and right figure panels respectively. The blue point located at (12, 10, 0)shows the time-invariant observation configuration. The blue point located at (2, 2, 0) is the source of the emission rate

Fig. 2
figure 2

Advection test with \(35\triangle t\) DAW and southwesterly wind. Plotting conventions are as in Fig. 1

Fig. 3
figure 3

Advection test with \(48\triangle t\) DAW and southwesterly wind. Plotting conventions are as in Fig. 1

Fig. 4
figure 4

The panels in the first row are singular values of the advenction tests shown in the left panels of Fig. 1\(\sim\) 3. The panels in the second row show the corresponding sensitivities of concentrations at the initial time approximated by 5 leading singular vectors

Fig. 5
figure 5

Advection test with \(10\triangle t\) DAW and northeasterly wind. Plotting conventions are as in Fig. 1

Fig. 6
figure 6

Advection test with \(35\triangle t\) DAW and northeasterly wind. Plotting conventions are as in Fig. 1

Fig. 7
figure 7

Advection test with \(48\triangle t\) DAW and northeasterly wind. Plotting conventions are as in Fig. 1

In Figs. 1, 2, and 3 we assume southwesterly winds and data assimilation windows are \(10\triangle t\), \(35\triangle t\) and \(48\triangle t\), respectively. The computation times are approximately 8.1s, 28.5s and 39.4s in our tests with the above three different assimilation windows, from which we can verify that the computation cost is nearly linearly increasing with the length of data assimilation window. The contributions to EnDFS of the initial states are shown in the left panels of Figs. 1, 2, and 3. We can find that in the horizontal field at lowest layer \((z=0)\), the optimized field of the concentration is enlarged with the extension of data assimilation windows. This is because an increased domain of the concentration are controlled with longer data assimilation windows.

The right panels of Figs. 1, 2, and 3 show EnDFS of the emission rate at each grid point with \(z=0\). From the right panel of Fig. 1, we can observe that contributions to EnDFS from emissions are less than \(2\times 10^{-3}\). Compared with the right panel of Fig. 1, the EnDFS of the emissions are obviously smaller than the EnDFS of the concentration in the influenced area. It indicates that the observations cannot detect the emission rates within \(10\triangle t\) data assimilation window. Thus, in this case initial values of the area adjacent to the observation site are alone optimized. It is shown in the right panels of Figs. 2 and 3 that emission rates play a more and more important role on the impact of observations. In this two cases, we consider both the concentration and emission rate as optimizable parameters. The quantitative balance between the concentration and emission rates is provided in Table 1.

Table 1 Ensemble relative ratios of the initial value and emission rate at the lowest layer

The upper row panels of Fig. 4 exhibit singular values corresponding to results shown in Figs. 1, 2 and 3. We approximate sensitivities of the initial concentration by the first five leading singular vectors weighted by associated singular values in the nuclear norm and show results in the lower row panels of Fig. 4. It is verified that the sensitive area can be well targeted by only few singular vectors, although the sensitivity analysis cannot provide the quantitative solutions with a clear statistical significance as the DFS of the model. Besides, in line with expectations, the area influenced by the observation configuration depends on wind direction and assimilation window lengths.

As the counter examples, Figs. 5, 6, and 7 also show the EnDFS of the concentration and emission rates under the same assumptions as Figs. 1, 2 and 3 respectively, except that northeasterly wind is assumed. As expected, our approach can demonstrate that with the adverse wind direction, emission rates are not detectable and improvable by the given observation configuration whatever the duration of the assimilation window is. The quantitative balances of related figures are exposed in Table 1. It can be seen that the insensitivity to emission rate optimization remains equally low and affected by numerical noises.

Emission signal tests: The purpose of emission signal tests (Figs. 8 and 9) is to assess the impact of observation configurations to the emission rates evolved with different diurnal profiles. We make the same assumptions as for Fig. 3, except that the wind speed in Figs. 8 and 9 is accelerated such that the profiles of the emission rate is better detectable in relation to the observation within the assimilation window \(48\triangle t\). The only distinction between situations in Figs. 8 and 9 is the pronounced diurnal cycle background profile of the emission rate during the assimilation window \(48\triangle t\). The different profiles of emission rates are correlated with the different emitted amount of that species during the data assimilation window. It is clearly shown in Table 1 that the distinct variation of the emission rates during the data assimilation window acts to level \({\bar{p}}^c\) and \({\bar{p}}^e\), and thus helps to improve the optimization results.

Fig. 8
figure 8

Emission signal test (weak) with \(48\triangle t\) DAW and southwesterly wind (\(v_x=1\) and \(v_y=1\)). Plotting conventions are as in Fig. 1

Fig. 9
figure 9

Emission signal test (strong) with \(48\triangle t\) DAW and southwest wind (\(v_x=1\) and \(v_y=1\)). Plotting conventions are as in Fig. 1

Diffusion tests: The vertical exchange of trace gases can be described by advection and diffusion, dependent on the nature of the process and the model grid resolution. In this study we confine our simulation to include the vertical diffusion only for the vertical coupling. The diffusion tests (Figs. 10, 11, and 12) aims to test our approach by comparing the EnDFS of the concentration and the emission rate at the layer \(z=0\) with a weak diffusion process and a strong diffusion process. We assume that the observation configuration at each time step is located at (12, 10, 4) in the diffusion test , with \(K(z)=0.5e^{-z^2}\) in Fig. 10 and \(K(z)=0.5e^{-z^2}+1\) in Fig. 11. Besides, Figs. 10 and 12 preserve the same assumptions with Fig. 3.

Fig. 10
figure 10

Diffusion test (weak) with \(48\triangle t\) DAW and southwesterly wind. Plotting conventions are as in Fig. 1

Fig. 11
figure 11

Diffusion test (strong) with \(48\triangle t\) DAW and southwesterly wind. Plotting conventions are as in Fig. 1

Fig. 12
figure 12

The panels in the first row show singular values of the diffusion tests shown in the left panels of Figs. 10 and 11. The panels in the second row show sensitivities of concentrations at the initial time approximated by 5 leading singular vectors

It is obvious from Figs. 3 and 10 that the different observation locations strongly influence the distribution of the concentration. Table 2 shows that with the same diffusion coefficient the EnDFS of the concentration in the lowest layer in Fig. 3 is definitely larger than the one in Fig. 10. Moreover, it can be seen from Table 1 that the observation configuration at the top layer is not efficient to emission rates with such weak diffusion within \(48\triangle t\) data assimilation window.

Table 2 The ensemble degrees of freedom for signal of the initial concentration and emission at the lowest layer

Comparing Fig. 10 with Fig. 11, we can see how the EnDFS of concentration and emission rates increase with the stronger diffusion process. The increasing impact of the observation configuration with the stronger diffusion is also verified by the EnDFS and ensemble relative ratios of DFS of the concentration and emission rate for Figs. 10 and 11 in Table 2. The balances between the concentration and emission rate for Figs. 10 and 11 are shown in Table 1. The significant difference of weights of emission rates in Table 1 implies that the observation configuration cannot detect emission rates at the lowest layer with such a weak diffusion in Fig. 10, while with the stronger in Fig. 11 both the concentration and emission rates should be considered as optimized parameters with the corresponding weights.

Finally, similar to Fig. 4, the singular values of Figs. 10, 11 and the approximating targeting results of sensitive parameters are shown in Fig. 12. It shows that the sensitive parameters can be also caught by few leading singular vectors in the diffusion tests.

7 Conclusions and outlooks

In this study we extended the transport-diffusion models forced by emission rates in a novel way. Based on the Kalman smoother, we developed an approach to quantitatively identify the impact of a given observation network on the optimization of the initial trace gas state and emission rates. The contribution to the degree of freedom for signal is adopted as a criterion to evaluate the potential assessment of observability to each element in the extended state vector. The degree of freedom for signal and a number of metrics was taken as a quantitative solutions to measure to what extent the parameters can be optimized in advance of the data assimilation procedure. It provides the opportunity to select the suitable and sensitive parameters to fulfill the optimization more efficiently. The ensemble case of the approach gave us the feasibility to determine the assessment of the potential observability jointly for initial values and emission rates for high-dimensional models in practical applications. Besides, we formulated sensitivities of observational networks by seeking the fastest directions of the perturbation ratio between initial states and observation configurations during the entire time interval. It facilitates to target the sensitive parameters to the observation networks by few leading singular values and vectors so that the computation costs can be further reduced. A series of experiments based on an elementary advection-diffusion model illustrated the significance of our approach in different situations.

In the future, we plan to apply this approach into the real atmospheric transport model to solve practical network validation problems prior to the solution of the inversion task, as far as the validity of the tangent linear assumption holds.