The assessment of potential observability for joint chemical states and emissions in atmospheric modelings

In predictive geophysical model systems, uncertain initial values and model parameters jointly influence the temporal evolution of the system. This renders initial-value-only optimization by traditional data assimilation methods as insufficient. However, blindly extending the optimization parameter set jeopardizes the validity of the resulting analysis because of the increase of the ill-posedness of the inversion task. Hence, it becomes important to assess the potential observability of measurement networks for model state and parameters in atmospheric modelings in advance of the optimization. In this paper, we novelly establish the dynamic model of emission rates and extend the transport-diffusion model extended by emission rates. Considering the Kalman smoother as underlying assimilation technique, we develop a quantitative assessment method to evaluate the potential observability and the sensitivity of observation networks to initial values and emission rates jointly. This benefits us to determine the optimizable parameters to observation configurations before the data assimilation procedure and make the optimization more efficiently. For high-dimensional models in practical applications, we derive an ensemble based version of the approach and give several elementary experiments for illustrations.


Introduction
Climate change and air quality are influenced by fluxes of green house gases, reactive gas emissions and aerosols. The temporal evolution of reactive chemistry in the atmosphere is usually modeled by atmospheric chemistry transport models. Poorly known initial values, sources and sinks cause a serious problem for the quality of the simulation addressed by data assimilation and inverse modeling (e.g. Sandu and Chai (2011)). It is a typical situation in practice data assimilation that the number of observations are markedly lower than the model degree of freedom (see Daley (1991)). In order to improve the quality of data assimilation and inverse modeling results, several aspects can be considered.
Firstly, the observation network can be optimized optimization problem subject to given external constraints, which has been addressed traditionally by Observation System Simulation Experiments [OSSEs, e.g. Daley (1991)]. The advanced concept of targeted observations has been popularized during the FASTEX campaign [e.g. Langland et al. (1999), Szunyogh et al. (1999)]. Theoretical studies are presented, for example by Berliner et al. (1999), or recently by Bellsky et al. (2014) for a case of study of highly nonlinear dynamics and Wu et al. (2016) for the optimal deployment of observations for time-varying system in a infinite dimensional domain within a finitetime interval. Secondly, the problem addressing the benefit assessment of individual observations or types of measurements has been investigated by Cardinali et al. (2004) and a sequence of related papers. Thirdly, the need to quantify the information provided by the observations can be satisfied by suitably selected measurements. Singular value decomposition (SVD) is a well-known tool applied to identify the priorities of observations by detecting the fastest growing uncertainties in meteorological models (e.g. Bousserez and Henze (2018), Buizza and Palmer (1995), Kang and Xu (2012), Khattatov et al. (1999), Liao et al. (2006), Lorenz (1965), Sandu et al. (2013), Singh et al. (2013), Spantini et al. (2015)). Besides, due to the importance of the sensitivity analysis of the model evolution with respect to the model errors and the observation network, the concept of the degree of freedom for signal (DFS) is also frequently applied to satellite retrieval problems, typically of significantly lower dimensions when compared to data assimilation [see for example Eyre (1990), Fisher (2003), Martynenko et al. (2010), Rabier et al. (2002), Rodgers (2000)]. Several methodologies have been proposed to account for model and estimation errors in both variational and ensemble data assimilation [e.g. Bellsky et al. (2014), Daescu and Navon (2004), , , Li et al. (2009), Sharma et al. (2014), Smith et al. (2013), Zupanski et al. (2007)]. Navon (1997) outlined the perceptibility and stability in optimal parameter estimation in meteorology and oceanography. Cioaca and Sandu (2014b) introduced a general framework to optimize a set of parameters controlling the 4D-var data assimilation system and it was applied into shallow water model state and other parameter in Cioaca and Sandu (2014a).
Most studies cited above were based on the classical data assimilation problem, of which the initial value or the prognostic state variable is the only parameter to be optimized. However, for chemistry transport or greenhouse gas models driven by emissions in the troposphere, the optimization of emission rates is at least as important as the initial value in the data assimilation. In order to get better analysis from combining the model with observations, efforts of joint optimization have been made by adding emission rates to concentrations in amount of manuscripts, such as Bocquet and Sakov (2013), Elbern and Schmidt (1999), Elbern et al. (2000), Elbern et al. (2007), Goris and Elbern (2013), Goris and Elbern (2015), Miyazaki et al. (2012), Winiarek et al. (2014). However, the lack of ability to observe and estimate surface emission fluxes directly is still a major roadblock hampering the progress in predictive skills of climate and atmospheric chemistry models. For example studies with focus on urban area emissions with interactions of longer range transport need to discriminate between initial values driven by external emissions, distant emissions like biomass burning, and local emissions [e.g. Duarte et al. (2021), Kaskaoutis et al. (2014), Kumar et al. (2019)]. In all cases, the question of model and emission uncertainty is crucial, as it blurs the model based capacities to discriminate between initial value and emission rate controlled simulations. This is especially challenging in case of biogenic emissions, which require related knowledge on plant and soil properties [e.g. Vogel and Elbern (2021a), Vogel and Elbern (2021b)].
Therefore, in this paper we novelly establish the dynamic model of emission rates and extended the traditional chemistry transport model. It provide the initial value and emission rates the same importance in the optimization. Based on the extended model, we investigate an approach to identify and assess the potential observability and sensitivity of any given observation configuration to the initial value and emission rates individually for atmospheric transport diffusion models by considering the (ensemble) Kalman smoother as underlying data assimilation method. Further, through the sensitivities of the initial value and emission rates, we have the opportunity to quantitatively balance weights between the initial value and emission rates. Those can help us determining the sensitive parameters and quantitatively build the initial variance matrices for both concentrations and emission rates in advance of the data assimilation process so that the computation resource can be saved and the accuracy of optimization can be improved.
The rest of paper is organized as follows. In Sect. 2, we establish the dynamic model for emissions and extend the original atmospheric transport model by emission rates in a novel way. In Sect. 3, based on the Kalman smoother, we present the specific approach to determine the degree of freedom for signal for both initial values and emission rates. In Sect. 4, we develop the ensemble approach to evaluate the degree of freedom for signal of the initial value and emission rates based on the non-singularity of the background covariance matrix. In Sect. 5, we identify the sensitive directions of the initial values and emission rates separately through maximizing the ratio of magnitudes of observation perturbation and the initial perturbation. It also provides us a possibility to estimate sensitivities of the initial value and emission rates by few leading singular values and singular vectors. In Sect. 6, we extend a 3D advection-diffusion equation with the dynamic model of the emission rate and give several elementary experiments to verify and demonstrate the ensemble approach. Finally, in Sect. 7, we conclude main contributions of this paper and discuss possible extensions.

Atmospheric inverse modeling extended by emission rates
We usually describe the concentration change rate by the following prognostic atmospheric transport model where A is a nonlinear model operator, c(t) and e(t) are the state vector of chemical constituents and emission rates at time t, respectively. The prior estimate of the state vector of concentrations c(t) is given and denoted by c b ðtÞ, termed as the background state. The prior estimate of emission rates, usually taken from emission inventories, is denoted by e b ðtÞ.
Let A be the tangent linear operator of A, dcðt 0 Þ ¼ cðt 0 Þ À c b ðt 0 Þ and deðtÞ ¼ eðtÞ À e b ðtÞ. The linear evolution of the perturbation of c(t) follows the tangent linear model as By the discretization of the tangent linear model in space, it is straightforward to obtain the linear solution of (2) discretized in space and continuous in time as where MðÁ; ÁÞ is the resolvent obtained from the spatial discretization of A. Without loss of generality, we assume dcðtÞ 2 R n , deðtÞ 2 R n , where n is the dimension of the partial phase space of concentrations and emission rates. Obviously, MðÁ; ÁÞ 2 R nÂn . In addition, let y(t) be the observation vector of c(t) and define where dyðtÞ 2 R mðtÞ , m(t) is the dimension of the phase space of observation configurations at time t. HðtÞ is a nonlinear forward observation operator mapping the model space to the observation space. Linearizing the nonlinear operator H as H, we present the observation system as where the observation error mðtÞ of the Gaussian distribution has zero mean and variance RðtÞ 2 R mðtÞÂmðtÞ . The Kalman smoother is a recursive estimator to provide the best linear unbiased estimates (BLUE) of the unknown variables with error estimates, using a sequence of observations [e.g. Gelb (1974)]. In addition to 4D-Var approaches, Kalman smoothers not only can provide the best linear unbiased estimate by a series of observations over time for the state vector, but also update the forecasting error covariances of that estimate.
It is clear to see that if the initial state of concentrations is the only parameter to be optimized, we can only consider the concentrations as the model states and apply the Kalman filter and smoother into the tangent linear model (2) with observations (5) directly. However, in most cases the exact values of emission rates are poorly known and also considered as parameters which need to be optimized. It has been shown by Elbern et al. (2007) that the diurnal profiles are better known than the exact amplitude of emission rates. Hence, we can only consider the amplitude of the diurnal emission cycle as optimization parameters. Thus, we first reformulate the background evolution of emission rates from time s to t in a dynamic form as an emission model where e b ðÁÞ is a n-dimensional vector, the i th element of e b ðÁÞ is denoted by e i b ðÁÞ and M e ðt; sÞ is the scaling diagonal matrix defined as We assume that the amplitude of emission rates is the only parameter to be optimized and then establish the dynamic model of emission rates subject to the above background evolution deðtÞ ¼ M e ðt; sÞdeðsÞ; s 6 t: Several studies [e.g. Gelb (1974)] stated that the estimation of the variable x by the fix-interval Kalman smoother generally equals to the conditional expectation based on the observations within the entire time interval, denoted by E½xjfyðt obs Þ; t obs 2 ½t 0 ; t N g. With the emission model (8), the estimate of e(t) by Kalman smoother on ½t 0 ; t N follows the linear property of the conditional expectation, E½eðtÞjfyðt obs Þ; t obs 2 ½t 0 ; t N g ¼E½M e ðt; sÞeðsÞjfyðt obs Þ; t obs 2 ½t 0 ; t N g ¼M e ðt; sÞE½eðsÞjfyðt obs Þ; t obs 2 ½t 0 ; t N g: It implies that BLUEs of emission rates with the dynamic model (8) preserve the same diurnal profiles of the background of emission rates.
By rewriting (3) as Mðt; sÞM e ðs; t 0 Þdeðt 0 Þds; we obtain the transport model with the state vector extended by emission rates Typically, there is no direct observation for emissions, apart from the flux tower observations used for carbon dioxide, which are not considered here. Therefore, we can reformulate the observation mapping as where 0 nÂn is a n Â n matrix with zero elements.
It is now clear that both concentrations and emission rates are included in the state vector of the homogeneous model (10). It allows us to apply the Kalman smoother in a fixed time interval ½t 0 ; t N in order to optimize both parameters. Besides, a more general case of the transport model extended by emission is shown in Appendix A.

The degree of freedom for signal of concentrations and emissions
In this section we will introduce the theoretical approach to determine the DFS of concentrations and emissions, resting on the extended model in Sect. 2. This approach gives us access to determine the potential ability of observations to optimize each variable of the above extended model, based on the Kalman smoother within a finite-time interval. For convenience, we generalize the atmospheric transport model (10) by the following discrete-time linear system on the time interval ½t 0 ; t 1 ; Á Á Á ; t N as where xðÁÞ 2 R n is the state variable and yðt k Þ 2 R mðt k Þ is the observation vector at time t k . The model error eðt k Þ and the observation error mðt k Þ, k ¼ 1; Á Á Á ; N of Gaussian distributions have zero means. The model error covariance matrix is denoted by Qðt k Þ and the observation error covariance matrix is denoted by Rðt k Þ. According to Appendix B, applying the singular value decomposition into P 1 2 ðt 0 jt À1 ÞG > R À 1 2 ¼ VSU > ; we obtaiñ where v i is the i th left singular vector in V related to the singular value s i , which is the i th element on the diagonal of S.
It is clear that the trace ofP can be used to evaluate the total improvements of model states. Thus, the nuclear norm is appropriately taken as the metric, which is defined as where A is any matrix and tr ðÁÞ denotes the trace of the matrix.
From (14), we obtain This is well-known as the degree of freedom for signal (DFS) of the model (e.g. Rodgers (2000)).
It is obvious that kPk 1 \kIk 1 ¼ n. Here n can be considered as the total relative improvement if the system is definitely observed. Thus, if we consider the ratiõ the percentage of the total improvement of the model is obtained, which is henceforth called the relative degree of freedom for signal.
In order to get a deeper insight into the potential capacity of the observation network to improve the estimation of all model states, we consider the corresponding value in the diagonal ofP as the contribution of the degree of freedom for signal. We denote the j th element on the diagonal ofP byP j . From (47), the contribution of the j th element of xðt 0 Þ to the degree of freedom for signal can be expressed as where v ij is the j th element of v i . Besides, we can see that (14) enables us to discriminate the DFS contributed to different optimization parameters, which are here emission rates and the initial value. Without loss of generality, we divide (14) into the following block matrix according to the dimension of c and ẽ P ¼P Further, the degree of freedom for signal of j th element in cðt 0 Þ and eðt 0 Þ are given by where v c ij and v e ij are the j th elements of v c i and v e i , respectively.
Moreover, the degree of freedom for signal of the concentration kP c k 1 and emission rates kP e k 1 are calculated by kP It is worth noticing that if and only if there is no prior correlation between the initial concentration and emission rates. In this case P ce ðt 0 jt À1 Þ ¼ 0 nÂn , the corresponding relative degrees of freedom for signal of the concentration and emission rates are defined as From (17),p c 2 ½0; 1Þ andp e 2 ½0; 1Þ seem like percentages of the relative improvements of concentration and emission rates, respectively. However, efficient observation networks ideally lead to values are close to 1 for both of them, such that It results from the reason that the normalization ofP is only with respect to the extended covariance matrix Pðt 0 jt À1 Þ rather than specified to the covariance matricesP c ðt 0 jt À1 Þ andP e ðt 0 jt À1 Þ individually. The relative degree of freedom for signal cannot serve our objective to distinguish the observability of the concentration and emission rates. By observing the block form ofP, we have Thus, in order to compare the potential improvements of the concentration and emission rates separately, we define a relative ratio of the degree of freedom for signal for the concentration or emission rates as If the degree or relative degree of freedom for signal of the observation network within the assimilation window is almost zero, an improvement cannot be expected. In contrast, fP c j g n j¼1 and fP e j g n j¼1 , which show the improvement of each parameter j of concentrations and emission rates respectively, can help us determining which parameters can be expected to be optimized by the existing observation configurations. Furthermore, comparingp c withp e , we can conclude that the estimate of the one with the larger relative ratio of freedom for signal can be improved more efficiently by the existing observation configurations than the other. In other words, ifp c [p e , the existing observation configuration is more sensitive to the initial values of concentrations. Conversely, ifp c \p e , the observation configurations can improve the estimate of emission rates better. According top c andp e , the relative weights between the concentration and emission rates can be identified quantitatively. In a data assimilation context, where observations are in a weighted relation to the background, the BLUE favors those parameters with higher observation efficiency.
The special case thatp e is very close to zero implies that observation network is not detectable for the emission-rate optimization.

The ensemble approach to determine the DFS
The ensemble Kalman smoother (EnKS) is a frequently applied tool for problems with a large number of control variables in the field of data assimilation [e.g. Anderson (2001), Evensen (2009)]. In this section, in order to identify the potential capacities of observation networks to optimize the concentration and especially the poorly known emission rates for high-dimensional problems, we will introduce the ensemble-based version of the approach in Sect. 3. According to Appendix C, analog to Sect. 3, we have Similar to Sect. 3, we can also divide P into the block form according to the dimensions of the concentration and emission rates. Correspondingly, we obtain the ensemble degree of freedom for signal of j th element in cðt 0 Þ and where v c ij and v e ij are the j th elements of v c i and v e i , respectively and We observe that (29) and (14) have a similar form. By virtue of P y 1 2 ðt 0 jt À1 Þ P f xy we can find that the final results of (14) and (63) are equivalent. However, compared with P 1 2 ðt 0 jt À1 ÞG > R À 1 2 , the ensemble expression P y 1 2 ðt 0 jt À1 Þ P f xy " R À 1 2 processes the absolute advantage that in the calculation of P f xy since we do not need the explicit form of G. It allows us to code it line by line such that our approach is computationally more efficient.
Analog to Sect. 3, we can similarly define the ensemble degree of freedom for signal (EnDFS) as k Pk 1 and consider each element on the diagonal of P as the contribution to EnDFS of the corresponding model state.
Thus, the ensemble relative degree of freedom for signal (EnRDFS) is defined by In order to distinguish the potential observabilities for the concentration and emission rates, the ensemble relative ratios of DFS remain 5 The sensitivity of observation networks The above discussion about DFS aims to evaluate the capacity of a predefined measurement network to optimize the initial value and emission rates simultaneously. In Appendix D, independent of any concrete data assimilation method, we use the singular vector approach [see Buizza and Montani (1999), Buizza and Palmer (1995), Liao et al. (2006) etc.] to identify sensitive directions of observation networks to the initial value and emission rates separately and show the association with Sect. 3. From Appendix D, we can see that the singular value s k shows the amplification of the impact of the initial state to observation configurations in the entire time interval. The associated singular vector in the state space v k is the direction of the k th growth of the perturbation of observations evolving from the initial perturbation. With the special choice W 0 ¼ P À1 ðt 0 jt À1 Þ and W ¼ R À1 , we compare the sensitivity analysis with the discussion in Sect. 3. It is clear that the vector v k also points to the k th direction which maximizes the relative improvement of estimates based on the Kalman smoother. It indicates that the states with higher contributions to DFS are the same with the states, which are more sensitive to the observation networks. Besides, the leading singular value s 1 is related to the operator norm ofP as which implies the upper boundedness ofP. It gives us an access to approximate and target sensitive parameters or areas with the metric of the leading singular vectors weighted by the corresponding singular values. Moreover, due to the homogeneity of the atmospheric transport model state vector extended with emissions, the above sensitivity analysis can be easily applied by dividing singular vectors into the block form according to the dimensions of the initial state and emissions. The corresponding blocks of different singular vectors indicate the different sensitive directions of the initial state and emissions and allow for this relative quantification. Correspondingly, we can approximate and target parameters sensitive to the existing observation networks for both the initial value and emission rates, respectively.

Experiment
In this section, we apply the approaches in Sects. 4 and 5 into an elementary advection-diffusion model to show how to assess the potential observability of concentrations and emission rates through EnDFS. We can see how it helps to identify the sensitive parameters of both concentrations and emission rates to the given observations. We consider a linear advection-diffusion model with Dirichlet horizontal (lateral) boundary condition and Neumann lower (surface) boundary condition in the vertical direction on the domain ½0; 14 Â ½0; 14 Â ½0; 4 as follows, where dc, de are the perturbations of the concentration and the emission rates respectively. K(z) is a differentiable function of height z.
In this example, the vertical coupling of horizontal grid layers is accomplished only by a diffusion operator to avoid signal imprints following some arbitrarily designed small scale vertical advection patterns. This is considered as valid, since the information loss by diffusion induced reduction of the noise ratio due to the signal diffusion analogue is significantly stronger than in case of advection We assume the velocities v x ¼ v y ¼ 0:5 and the time step Mt ¼ 0:5 and the numerical solution is based on the symmetric operator splitting technique [see Yanenko (1971)] with the following operator sequence where T x and T y are transport operators in horizontal directions x and y, D z is the diffusion operator in the vertical direction z. The parameters of emission and deposition rates are included in A. The Lax-Wendroff algorithm is chosen as the discretization method for horizontal advection with Mx ¼ My ¼ 1. The vertical diffusion is discretized with Mz ¼ 1 by Crank-Nicolson scheme with the Thomas algorithm [see Higham (2002)] as solver. The number of the grid points N g ¼ 1125.
With the same temporal and spatial discretization of the concentration, the background knowledge of the emission rates is given by e b ðt n ; i; j; lÞ, n ¼ 1; Á Á Á ; N. We establish the discrete dynamic model of the emission rates according to (8) where M e ðt nþ1 ; t n Þ ¼ e b ðt nþ1 Þ=e b ðt n Þ: In this section, we assume dd is a constant over time and the observation operator H(t) mapping the state space to the observation space is a 1 Â 2N g time-invariant matrix. In Wu et al. (2016), the convergence of the numerical solution based on the above splitting and dicretization method to the original solution of the partial differential equation (36) has been proved.
In our simulations, we produce q ¼ 500 (the ensemble number) samples for the initial concentration and emission rates respectively by pseudo independent random numbers and make the states correlated by the moving average technique. It has been tested that the computation cost of our approach is linearly increasing with the number of ensembles. In the following, we present three different tests, aiming to demonstrate roles of variable winds, emissions, and vertical diffusion.
Advection tests: The following part demonstrates the potential capacity and limits of the DFS analysis tool. The prototypical examples are designed to show the expected elementary outcomes of the following situations. They exhibit the effects of assimilation window length in relation to emission location. These include (i) an assimilation window too short to capture emission impacts at the observation site, (ii) an extended assimilation window with balanced signal of impacts of concentrations and emissions at the observation site, (iii) a further increased assimilation window featuring a declining impact of initial values and growing emission impact. The first elementary advection test (Figs. 1, 2, 3, 4, 5, 6, and 7) identifies the sensitivities of parameters subject to different wind direction and data assimilation window through the EnDFS of each element of the concentrations and emission rates. Focusing on the advection effects, we apply the model with a weak diffusion process ðKðzÞ ¼ 0:5e Àz 2 Þ. In Figs. 1, 2, and 3 we assume southwesterly winds and data assimilation windows are 10Mt, 35Mt and 48Mt, respectively. The computation times are approximately 8.1s, 28.5s and 39.4s in our tests with the above three different assimilation windows, from which we can verify that the computation cost is nearly linearly increasing with the length of data assimilation window. The contributions to EnDFS of the initial states are shown in the left panels of Figs. 1, 2, and 3. We can find that in the horizontal field at lowest layer ðz ¼ 0Þ, the optimized field of the concentration is enlarged with the extension of data assimilation windows. This is because an increased domain of the concentration are controlled with longer data assimilation windows.
The right panels of Figs. 1, 2, and 3 show EnDFS of the emission rate at each grid point with z ¼ 0. From the right panel of Fig. 1, we can observe that contributions to EnDFS from emissions are less than 2 Â 10 À3 . Compared with the right panel of Fig. 1, the EnDFS of the emissions are obviously smaller than the EnDFS of the concentration in the influenced area. It indicates that the observations cannot detect the emission rates within 10Mt data assimilation window. Thus, in this case initial values of the area adjacent to the observation site are alone optimized. It is shown in the right panels of Figs. 2 and 3 that emission rates play a more and more important role on the impact of observations. In this two cases, we consider both the concentration and emission rate as optimizable parameters. The quantitative balance between the concentration and emission rates is provided in Table 1.
The upper row panels of Fig. 4 exhibit singular values corresponding to results shown in Figs. 1, 2 and 3. We approximate sensitivities of the initial concentration by the first five leading singular vectors weighted by associated singular values in the nuclear norm and show results in the lower row panels of Fig. 4. It is verified that the sensitive area can be well targeted by only few singular vectors, although the sensitivity analysis cannot provide the quantitative solutions with a clear statistical significance as the DFS of the model. Besides, in line with expectations, the area influenced by the observation configuration depends on wind direction and assimilation window lengths.
As the counter examples, Figs. 5, 6, and 7 also show the EnDFS of the concentration and emission rates under the same assumptions as Figs. 1, 2 and 3 respectively, except that northeasterly wind is assumed. As expected, our approach can demonstrate that with the adverse wind direction, emission rates are not detectable and improvable by the given observation configuration whatever the duration of the assimilation window is. The quantitative balances of related figures are exposed in Table 1. It can be seen that the insensitivity to emission rate optimization remains equally low and affected by numerical noises.     Emission signal tests: The purpose of emission signal tests (Figs. 8 and 9) is to assess the impact of observation configurations to the emission rates evolved with different diurnal profiles. We make the same assumptions as for Fig. 3, except that the wind speed in Figs. 8 and 9 is accelerated such that the profiles of the emission rate is better detectable in relation to the observation within the assimilation window 48Mt. The only distinction between situations in Figs. 8 and 9 is the pronounced diurnal cycle background profile of the emission rate during the assimilation window 48Mt. The different profiles of emission rates are correlated with the different emitted amount of that species during the data assimilation window. It is clearly shown in Table 1 that the distinct variation of the emission rates during the data assimilation window acts to level p c and p e , and thus helps to improve the optimization results.
Diffusion tests: The vertical exchange of trace gases can be described by advection and diffusion, dependent on the nature of the process and the model grid resolution. In this study we confine our simulation to include the vertical diffusion only for the vertical coupling. The diffusion tests (Figs. 10,11,and 12) aims to test our approach by comparing the EnDFS of the concentration and the emission rate at the layer z ¼ 0 with a weak diffusion process and a strong diffusion process. We assume that the observation configuration at each time step is located at (12, 10, 4) in the diffusion test , with KðzÞ ¼ 0:5e Àz 2 in Fig. 10 and KðzÞ ¼ 0:5e Àz 2 þ 1 in Fig. 11. Besides, Figs. 10 and 12 preserve the same assumptions with Fig. 3.
It is obvious from Figs. 3 and 10 that the different observation locations strongly influence the distribution of the concentration. Table 2 shows that with the same diffusion coefficient the EnDFS of the concentration in the lowest layer in Fig. 3 is definitely larger than the one in Fig. 10. Moreover, it can be seen from Table 1 that the observation configuration at the top layer is not efficient to emission rates with such weak diffusion within 48Mt data assimilation window.
Comparing Fig. 10 with Fig. 11, we can see how the EnDFS of concentration and emission rates increase with the stronger diffusion process. The increasing impact of the observation configuration with the stronger diffusion is also verified by the EnDFS and ensemble relative ratios of DFS of the concentration and emission rate for Figs. 10 and 11 in Table 2. The balances between the concentration and emission rate for Figs. 10 and 11 are shown in Table 1. The significant difference of weights of emission rates in  Table 1 implies that the observation configuration cannot detect emission rates at the lowest layer with such a weak diffusion in Fig. 10, while with the stronger in Fig. 11 both the concentration and emission rates should be considered as optimized parameters with the corresponding weights. Finally, similar to Fig. 4, the singular values of Figs. 10, 11 and the approximating targeting results of sensitive parameters are shown in Fig. 12. It shows that the sensitive parameters can be also caught by few leading singular vectors in the diffusion tests.

Conclusions and outlooks
In this study we extended the transport-diffusion models forced by emission rates in a novel way. Based on the Kalman smoother, we developed an approach to quantitatively identify the impact of a given observation network on the optimization of the initial trace gas state and emission rates. The contribution to the degree of freedom for signal is adopted as a criterion to evaluate the potential assessment of observability to each element in the extended state vector. The degree of freedom for signal and a number of metrics was taken as a quantitative solutions to measure to what extent the parameters can be optimized in advance of the data assimilation procedure. It provides the opportunity to select the suitable and sensitive parameters to fulfill the optimization more efficiently. The ensemble case of the approach gave us the feasibility to determine the assessment of the potential observability jointly for initial values and emission rates for high-dimensional models in practical applications. Besides, we formulated sensitivities of observational networks by seeking the fastest directions of the perturbation ratio between initial states and observation configurations during the entire time interval. It facilitates to target the sensitive parameters to the observation networks by few leading singular values and vectors so that the computation costs can be further reduced. A series of experiments based on an elementary advection-diffusion model illustrated the significance of our approach in different situations.
In the future, we plan to apply this approach into the real atmospheric transport model to solve practical network validation problems prior to the solution of the inversion task, as far as the validity of the tangent linear assumption holds.

Appendix A
In this appendix, we show a more general case to extend the control vector by emissions. As we know, the initial state and emission rates do not have the same dimension in some practical cases. Compared with (2), the general situation leads us to consider the following model where B(t) is an operator transforming the emission state vector into the concentration-state space. Combining with (8), we obtain the extended model Appendix B In this Appendix, we derive the foundation matrix of DFS based on the Kalman smoother and evaluate the potential improvement of the estimate via SVD.
For the discrete-time linear system (12) with the observation mapping (13) in Sect. 3, we denote the BLUE of xðt i Þ based on fyðt 0 Þ; Á Á Á ; yðt k Þg byxðt i jt k Þ, t i ; t k 2 ½t 0 ; Á Á Á ; t N . Especially, the prior estimation, or background state, of xðt 0 Þ is denoted byxðt 0 jt À1 Þ. Correspondingly, Pðt i jt k Þ is defined as the error covariance of xðt i jt k Þ, t i 2 ½t 0 ; Á Á Á ; t N and t k 2 ½t À1 ; Á Á Á ; t N . It is known that the inverse of the analysis error covariance matrix at initial time, P À1 ðt 0 jt N Þ of a fixed-interval Kalman smoother is the optimal Hessian of the underlying cost function of 4D-Var (see Li and Navon (2001)). Thus, we have It is clear that (41) comprises the information of the initial condition, model evolution, observation configurations and errors over the entire time interval ½t 0 ; Á Á Á ; t N . At the same time, it is independent of any specific data and state vector, apart from the reference model evolution MðÁ; ÁÞ as well as the observation operator HðÁÞ. Besides, G > R À1 G is the observability Gramian with respect to R À1 in control theory [see Brockett (1970)]. It represents the observation capacity of the observation networks with respect to the model. It can be seen now that (41) includes all available information before starting the data assimilation procedure. In order to evaluate the potential improvement of the estimate by the Kalman smoother, we aspire a matrix, which allows us for a direct and normalized comparison between sensitivities to initial values and emission rates. To this end, we consider matrixP with the following form: where I is the identity matrix and P 1 2 ðt 0 jt À1 Þ satisfies P 1 2 ðt 0 jt À1 ÞP 1 2 ðt 0 jt À1 Þ ¼ Pðt 0 jt À1 Þ. The matrixP is a normalized matrix of the difference between the background forecast error covariance matrix Pðt 0 jt À1 Þ and the analysis error covariance matrix Pðt 0 jt N Þ. It is the foundation matrix to study the DFS of models ( Fisher (2003), Rodgers (2000), Singh et al. (2013)), which shows how much observation networks improve the estimation of model states.
Pðt 0 jt N Þ is unknown prior to the data assimilation procedure, so we use (41) to rewriteP as P ¼ P À 1 2 ðt 0 jt À1 ÞðPðt 0 jt À1 Þ À Pðt 0 jt N ÞÞP À 1 2 ðt 0 jt À1 Þ ¼ I À ðI þ P 1 2 ðt 0 jt À1 ÞG > R À1 GP 1 2 ðt 0 jt À1 ÞÞ À1 : It is worth noting that in (44) is always invertible even if the observation Gramian G > G is not full-rank. Thus,P is well-defined for all models with invertible initial covariance and observation systems with invertible error covariances within the assimilation window t 0 to t N . Then, we apply SVD to simplify (44) where V and U are unitary matrices consisting of the left and right singular vectors respectively, while S is the rectangular diagonal matrix consisting of the singular values. Then, (44) can be simplified as where r is the rank of (44) and v i is the i th left singular vector in V related to the singular value s i , which is the i th element on the diagonal of S.

Appendix C
In this appendix, we investigate the ensemble version of expressions in Appendix B. For the discrete-time system (12), we denote the ensemble samples ofxðt i jt j Þ byx k ðt i jt j Þ, i; j ¼ 1; Á Á Á ; N, k ¼ 1; Á Á Á ; q, where q is the number of ensemble members. Correspondingly, the ensemble means ofxðt i jt j Þ is given by where Xðt i jt j Þ ¼ ðx 1 ðt i jt j Þ;x 2 ðt i jt j Þ; Á Á Á ;x q ðt i jt j ÞÞ is the n Â q ensemble matrix, 1 iÂj is a i Â j matrix of which each element is equal to 1. We calculate the ensemble forecast and analysis covariances as whereXðt i jt j Þ ¼ Xðt i jt j Þ À 1 q Xðt i jt j Þ1 qÂq is the related perturbation matrix. We define the ensemble observation configurations in the entire assimilation window as Further, the ensemble mean and the forecasting error covariance matrix of the ensemble observation configurations are given by Similarly, we denote the ensemble covariance between the initial states and the forecasting observations by P f xy ¼ Furthermore, we define the ensemble observations aŝ and then assume It is shown by Evensen (2009) that the ensemble forecast and analysis covariances have the same form as the covariances in the standard Kalman filter. However, the ensemble size q is significantly smaller than the dimension of the model n in practical applications. As a result, the initial ensemble covariance Pðt 0 jt À1 Þ is not invertible. In this case, the pseudo inverse is a widely used alternative of the inverse of a matrix, due to its optimal uniqueness properties. We denote the pseudo inverse of a matrix A by A y . Then, for the initial ensemble covariance we apply the singular value decomposition to where V 0 2 R nÂn and U 0 2 R qÂq consist of the left and right singular vectors respectively, S 0 2 R nÂq is a rectangular diagonal matrix with singular values fs 0i js 0i > 0g q i¼1 on its diagonal. Thus, with r 0 being the rank of S 0 . Hence, we find a pseudo inverse P y 1 whereŜ y 0 is the pseudo inverse ofŜ 0 with the diagonal ð1=s 01 ; Á Á Á ; 1=s 0r 0 ; 0 1ÂðnÀr 0 Þ Þ: Analog to (43), we define P as P ¼ P y 1 2 ðt 0 jt À1 Þð Pðt 0 jt À1 Þ À Pðt 0 jt N ÞÞ P y 1 2 ðt 0 jt À1 Þ: Likewise, corresponding to (13), we present the observation system in the entire time interval as where y ¼ ðy > ðt 0 Þ; Á Á Á ; y > ðt N ÞÞ > , m ¼ ðm > ðt 0 Þ; Á Á Á ; m > ðt N ÞÞ > and G as the observation configuration for xðt 0 Þ. Then, for the analysis error covariance matrix, we obtain Pðt 0 jt N Þ ¼ Pðt 0 jt À1 Þ À Pðt 0 jt À1 ÞG > ðG Pðt 0 jt À1 ÞG > þ RÞ À1 G Pðt 0 jt À1 Þ ¼ Pðt 0 jt À1 Þ À Pðt 0 jt À1 ÞG > R À 1 2 ðI þ R À 1 2 G Pðt 0 jt À1 ÞG > R À 1 2 Þ À1 R À 1 2 G Pðt 0 jt À1 Þ ¼ Pðt 0 jt À1 Þ À P f xy R À 1 2 ðI þ R À 1 2 P f yy R À 1 2 Þ À1 R À 1 2 ð P f xy Þ > : Further, analog to (47), we obtain P ¼ P y 1 2 ðt 0 jt À1 Þð Pðt 0 jt À1 Þ À Pðt 0 jt N ÞÞ P y 1 2 ðt 0 jt À1 Þ ¼ P y 1 2 ðt 0 jt À1 Þ P f xy R À 1 2 ðI þ R À 1 2 P f yy R À 1 2 Þ À1 R À 1 2 ð P f xy Þ > P y 1 2 ðt 0 jt À1 Þ: Let P N i¼1 mðt i Þ ¼ m be the number of observations available within the assimilation window. To proceed with (61), we apply again the singular value decomposition for the following matrix P y 1 2 ðt 0 jt À1 Þ P f xy R À where U 2 R mÂm and V 2 R nÂn consists of the right and left singular vectors of P y 1 2 ðt 0 jt À1 Þ P f xy R À 1 2 , respectively. S 2 R nÂm consists of the singular values on its diagonal.
Funding Open Access funding enabled and organized by Projekt DEAL.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons. org/licenses/by/4.0/.