Using ensemble adjustment Kalman filter to assimilate Argo profiles in a global OGCM
- First Online:
- Cite this article as:
- Yin, X., Qiao, F. & Shu, Q. Ocean Dynamics (2011) 61: 1017. doi:10.1007/s10236-011-0419-2
- 494 Downloads
An ensemble adjustment Kalman filter (EAKF) is used to assimilate Argo profiles of 2008 in a global version of the Modular Ocean Model version 4. Four assimilation experiments are carried out to compare with the simulation without data assimilation, which serves as the control experiment. All experiment results are compared with dataset of Global Temperature–Salinity Profile Program and satellite sea surface temperature (SST). The first experiment (Exp 1) is implemented by perturbing temperature of upper layers in the initial conditions (ICs) with an amplitude of 1.0°C and no ensemble inflation. The results from Exp 1 show that the simulated temperature (salinity) deviation in the upper 400 m (500 m) is reduced through Argo data assimilation; however, these deviations are increased in deeper layers. The error reduction in SST is much greater during January to June than during the rest of the year. Three more experiments are designed to understand the responses in different layers and months. Two of them test model sensitivities to ICs by perturbing them vertically: one over the vertical extent of the whole water column (Exp 2) and the other employs smaller perturbation amplitude of 0.1°C (Exp 3). Exp 2 shows that the simulated temperature and salinity deviations are systematically improved in the whole water column. Comparison between Exps 2 and 3 suggests that perturbation amplitude is important. Exp 4 tests the influence of the optimal inflation factor of 5%, which is determined by other set of numerical tests. Exp 4 improves assimilation performance much more than the other three experiments without inflation. Therefore, we conclude that the perturbation should be introduced to all model layers, proper perturbation amplitude is important for Ocean data assimilation using EAKF, and the ensemble inflation by an optimal inflation is critical to improve the skill of the EAKF analysis.
KeywordsEAKF Argo profiles Ocean data assimilation Perturbation of the initial conditions Ensemble inflation
Ocean data assimilation (ODA) is often used to reconstruct historical time series, which can help to improve our understanding of dynamics behind ocean circulations and evolutions (e.g., Carton et al. 2000a, b; Chepurin et al. 2005; Zhang et al. 2005). ODA is also used to combine observations and numerical model to provide more accurate initial conditions (ICs) for ocean forecast. Among many ODA methods, ensemble Kalman filters (EnKF, Evensen 1994; Houtekamer and Mitchell 1998) can reveal the probability distribution of numerical models through ensemble statistic analysis. As a consequence, ensemble methods, which are maturing rapidly in recent decades, are gradually being used by many research groups for ocean prediction (e.g., Anderson 2003).
EnKF was originally developed to approximately compute solutions of nonlinear filtering problems by the Kalman filter (Kalman 1960; Kalman and Bucy 1961; Courtier et al. 1993). Various methods were then developed to reduce assimilation errors and/or to decrease computational cost (Anderson 2001; Bishop et al. 2001; Pham 2001; Whitaker and Hamill 2002; Tippett et al. 2003). The ensemble adjustment Kalman filter (EAKF; Anderson 2001, 2003) is one important representation of these methods. Comparing with the traditional EnKF, not only the perturbation of the observation is avoided but also the computational cost is reduced; the EAKF performs well even with moderate ensemble size (Anderson 2001; Evensen 2003; Zhang and Anderson 2003). There are many successful implementations of the EAKF method in ODA. Zhang et al. (2005) developed a parallelized ensemble filter system to assimilate the observations of 1980–2002 and compared the results with those from 3D variational data assimilation. Zhang et al. (2007) then applied the EAKF to a coupled climate model and analyzed the meridional overturning circulation from the assimilated results. Anderson et al. (2009) developed the Data Assimilation Research Testbed for data assimilation research, education, and development. This method was also used in other models, such as the El Niño/La Niña–Southern Oscillation models (Karspeck and Anderson 2007) and regional ocean models (Yin et al. 2010a).
Although the algorithm of the EAKF method is used more and more in complicated ocean models, there are still some problems that need to be studied, such as the sampling of ICs (Evensen 2004) and the inflation of ensemble samples (Anderson 2007). About the sampling of ICs, it is necessary to test the perturbation methods towards the possible states of the real ocean. In this study, numerical experiments are designed to test sensitivity from the vertical extent and the perturbation amplitude for ICs. The ensemble inflation that is performed to increase the spread of ensemble samples can be used to avoid the convergence of the ensemble members. If the probability distribution function of model states is computed using the converged ensemble members, the followed EAKF analysis will be inaccurate and unreliable. In order to obtain a proper spread, there are some attempts in the recent literature. Hamill et al. (2001) analyzed ensemble mean errors as a function of the inflation factor and noticed that the optimal inflation factor was a function of ensemble size; Anderson (2007) developed an adaptive covariance inflation algorithm using a hierarchical Bayesian approach. Zhang et al. (2010) developed an adaptively inflated ensemble filter, which employs a precomputed “climatological” variance to inflate the covariance where the ensemble would otherwise have trouble encompassing the true state. In this study, the optimal inflation factor of 5% is obtained through a series of numerical tests. In order to maintain the intrinsic relationship among different variables, the same inflation factor is applied for all the variables in the whole model domain.
The paper is organized as follows. Section 2 describes the data used for assimilation and validation, the global ocean general circulation model (OGCM), the modular implementation of the EAKF, the designing of the experiments, and the statistic indexes used for comparison. Section 3 presents results of these experiments, including two base experiments and three sensitivity experiments for analyzing issues related to vertical extent of perturbation, perturbation amplitude, and the optimal inflation factor. Finally, summary and discussions are given in Section 4.
2.1 Argo data for assimilation
The Argo profiles of temperature and salinity provided by the Coriolis Argo Data Center are employed in this study. The data are arranged by daily files in NetCDF format, which makes them easy to be used in data assimilation as input profiles by serial or parallel programs. This dataset is provided together with detailed description and quality control (QC) flags. There are two levels of QC performed on this dataset: the first level is the real-time QC that performs a set of agreed automatic checks, and the second level is the delayed-mode QC. Only those profiles passed all real-time QC tests with QC flag equal to 1 or passed the delayed-mode QC are used in our experiments. In order to deal with these profiles conveniently, two self-defined types are developed in the EAKF module: one is used to collect information for Argo temperature/salinity profiles, and the other for an observation operator to obtain modeled vertical profiles.
2.2 Data for validation
The dataset of Global Temperature-Salinity Program (GTSPP) provided by the US National Oceanographic Data Center is used in this study for validation. Most of the Argo profiles are contained in the dataset of the GTSPP but they are removed from the GTSPP before we perform validation (hereafter, we use GTSPP presents those profiles in GTSPP dataset eliminated Argo profiles). The modeled results are interpolated onto the same location as the GTSPP profiles for a more accurate comparison.
The satellite sea surface temperature (SST) used for comparison is the optimally interpolated microwave (MW) SST product created from the SSTs of two satellites: the MW Tropical Rainfall Measuring Mission Microwave Imager (TMI) and the Advanced Microwave Scanning Radiometer–Earth Observing System. This dataset is produced by the Remote Sensing Systems and sponsored by the National Oceanographic Partnership Program, the National Aeronautics and Space Administration (NASA) Earth Science Physical Oceanography Program, and the NASA Measures Discover Project, which is available at www.remss.com. It is an improved version of SSTs from multisensors, and provides daily average globally with a horizontal resolution of 0.25° × 0.25°. Extensive comparisons are provided at that website, and the statistics shows that these SSTs have a standard deviation equal to 0.56°C for collocations within the range of the TMI data (40°S–40°N), while a higher one equal to 0.65°C for the global collocations (90°S–90°N). Similar as dealing with the GTSPP profiles, the model outputs of all experiments are interpolated onto the same grid as the satellite SST before comparison.
2.3 A global OGCM based on MOM4
The Modular Ocean Model version 4 (MOM4; Griffies et al. 2007) is used to setup a global coupled ice-ocean model. The surface wave-induced vertical mixing is included into the model based on Qiao’s parameterization (Qiao et al. 2004). The model domain is 81.5°S–89.5°N around the globe, and the horizontal resolution is 1° × 1° everywhere except for the tropical ocean (30°S–30°N) where the resolution is (1/3)° in the meridional direction. There are 50 vertical levels, with 10-m resolution for the top 220 m and reduced resolutions below. The model topography is interpolated from a gridded bathymetric dataset of 5′ × 5′ resolution (ETOP5 1986) with the maximum depth set to 5,500 m. As a component of MOM4, Sea Ice Simulator is a dynamics/thermodynamic sea ice model that employs a three-layer scheme for the thermodynamics and full dynamics with internal ice forces calculated using elastic-viscous-plastic rheology (Winton 2000).
The model uses the annual mean temperature and salinity from Levitus and Boyer (1994) as its ICs, and is driven by climatological atmosphere forcing from the Ocean Model Intercomparison Project (OMIP; WOCE/CLIVAR 2002). The model state after 11 years of spin-up is taken as the ICs for the year of 2000 for simulation with surface forcing from the National Centers for Environmental Prediction reanalysis data (Kalnay et al. 1996) provided by the NOAA/OAR/ESRL PSD, Boulder, Colorado, USA, including wind, atmosphere temperature and sea level pressure. The rest of the forcing is from the climatological dataset of OMIP. More details, including validation of this model, can be found in Shu et al. (2011).
2.4 An EAKF module for Argo profiles
The assimilation system used here is composed of two main parts: the ensemble members and the EAKF module. The ensemble members, which are the integrations of the same model from different perturbed ICs, are running separately. At beginning, the EAKF module will be started first to collect the information of observations (include the observed time, location, and values), then send message to each ensemble members about the observing time, and waiting for the predicted ensemble states. All the ensemble members will start the model integration together to predict the ensemble states of the ocean. Once the predicted ensemble states at observing time are available, the EAKF module will receive these data and start the EAKF analysis to update the ensemble states by EAKF. The updated results will be sent back to each ensemble members and the ensemble integration will be continued for the next observing time.
The whole system including ensemble members and EAKF module can be arranged by UNIX shell scripts or parallel languages such as MPI, OpenMP, etc. If the assimilation frequency is not too high, say once every day, it is more efficient to arrange the system by scripts with required information exchanged through input/output files. Consequently, it will be more portable for the EAKF module to other ocean models from the MOM4 used here; what is all needed is to write some general procedures to deal with variables from different models. If the assimilation frequency is very high, it is necessary to combine the EAKF process with the ensemble members in order to exchange information faster. In most cases, the observation frequency is not so high. Therefore, arranging the system by scripts is an efficient choice.
The EAKF module specified here is for assimilating the Argo temperature and salinity profiles. During the process of the Argo profiles being assimilated into the ocean model, the EAKF is performed for multivariables. In other words, once the Argo temperature profiles are used to adjust model temperature, the model salinity is also adjusted by the covariance between salinity and temperature. On the other hand, a similar procedure is performed for the Argo salinity profiles. The standard deviations of errors for observed temperature and salinity in this study are chosen to be 1°C and 0.2 psu, respectively. The localization of covariance is performed by a polynomial function, same as in Zhang et al. (2005), and the Euclidean spatial distance in this function is selected as 2° horizontally, 100 m vertically and 5 days in time.
2.5 Experiment design
2.5.1 Base experiments (CTL and Exp 1)
The year of 2008 is the focus period of this study and the model simulation without the ODA is referred as the control run (CTL). The ICs of the ensemble members are prepared from CTL. The method for generating random fields given by Evensen (1994) is used here to perturb the surface fields and then the surface perturbations are smoothly projected to subsurface layers. The random fields are smooth spatially, and their spatial correlation decreases with increasing distance. Therefore, this kind of perturbation will not break the smoothness of the integration itself.
In Eq. 1, the vertical layers are divided into three parts: the first part is for the upper mixed layer with the perturbations of CP at surface and 0 at the bottom of the upper mixed layer; the second part includes layers whose perturbations are smoothly reduced to 0 from the perturbation in the layer of k1, in case the bottom of the upper mixed layer is not exactly located at vertical model grids; and the third part is for the remaining layers that are kept unperturbed. The second part is chosen to be five layers or less, which could include the bottom layer if the third part has zero layer.
In order to simplify the perturbation of ICs, only the temperature field is perturbed for the ensemble members. The perturbations are normalized by C to ensure the root mean square (RMS) of surface perturbation equals to a specific value defined as the perturbation amplitude. For Exp 1, the perturbation amplitude is 1°C. Accordingly, the perturbed layers in Exp 1 are limited to the upper layers of the ocean, and in general they are only few hundred meters below the sea surface.
2.5.2 Sensitivity experiments (Exps 2–4)
List of assimilation experiments
Inflation factor (%)
In this way, if the temperature in a special layer has the maximum difference with the surface temperature, its perturbation will be zero. For most cases, the bottom of the whole water column has the maximum temperature difference with the surface, and then the perturbation in the bottom layer will be zero. In general, the deeper part of the ocean has smaller perturbation.
This perturbation method is used to perturb the initial temperature fields for the sensitivity experiments. The sensitivity of perturbation amplitude is also tested by setting the perturbation amplitude equal to 1.0°C and 0.1°C in Exps 2 and 3, respectively.
2.6 Statistic indexes for comparison
RMS error is employed to measure the difference between observations and the results from our numerical experiments. In order to compare daily mean satellite SST and GTSPP profiles exactly, the model output is saved as daily average. For the GTSPP profiles, RMS errors in temperature and salinity are computed in different layers with the layer thickness set to be 100 m. The RMS errors for the whole year of 2008 are computed first, and then the RMS errors on each day are computed. For satellite SST, two kinds of RMS error are computed in this study: one is temporal RMS error obtained over the spatial SST array at the same time, and the other is spatial RMS error that is computed from the time series at the same grid point.
3 Analysis of assimilation results
3.1 Comparison with the GTSPP profiles
In Exp 1, the RMS error in temperature (salinity) after Argo data assimilation is reduced in the layers shallower than 400 m (500 m). However, both temperature and salinity errors in Exp 1 are increased in the deeper part of the ocean. This indicates that only perturbing upper layers of the ocean may lead to an inaccurate EAKF analysis in the deeper part of the ocean.
To understand and solve the problem seen in Exp 1, we carried two sensitivity experiments to check if the ODA performance could be improved by perturbing the ICs with different vertical extent or amplitude. In Exp 2, the temperature of ICs is perturbed in all layers of the water column. As a consequence, the RMS errors in temperature and salinity compared with the GTSPP profiles are systemically reduced in all layers. Comparing the results from Exps 2 and 3, when the perturbation amplitude changes from 1.0°C to 0.1°C, we find that smaller perturbation amplitude will degrade the ODA performance.
Recall that there is 5% ensemble inflation applied in Exp 4 to increase the ensemble spread, the vertical RMS errors of temperature and salinity from Exp 4 show that it has the best results among all the experiments. In the layers shallower than 700 m, the temperature error in Exp 4 is comparable to that in Exp 2, whereas the simulated salinity error is much reduced after including the ensemble inflation. In the layers deeper than 700 m, the simulated temperature error is reduced more in Exp 4 than in Exp 2. For salinity, the errors for Exps 2 and 4 are quite similar in the layers deeper than 1,000 m where the salinity errors are already quite small.
Since only the temperature fields were perturbed in ICs, the ensemble spread of salinity is generated because of the model integration itself. The variance of salinity is relatively smaller which caused the ensemble spread of salinity is not large enough to provide an accurate filtering. On the other hand, the variance of temperature in deep layers is smaller than upper layers. This also caused the smaller ensemble spread in deep layers. As a result, the ensemble inflation works well on salinity in upper layers and on temperature in deeper layers. It is indicated that the inflation benefits to those part where the ensemble spread is small.
Overall, the assimilation results are improved because of the perturbation in all layers and the perturbation amplitude is important for the error reduction at the beginning period of the ODA. Ensemble inflation is critical to improve the skill of the EAKF analysis.
3.2 Comparison with satellite SST
Since satellite SST data has a good spatial and temporal coverage of the world ocean, the comparison between modeled and satellite SSTs can provide some insight to the spatial and temporal evolution of simulation error.
The AI is shown in Fig. 7b, which represents the percentage reduction of RMS error by ODA. The comparison between Exps 2 and 3 shows that the perturbation amplitude is important for the beginning period of ODA, but not so for the later period. Once the model state is perturbed, the model will adjust according to model physics to reach a new balanced state during the beginning period. If the perturbation is very small, this period of adjustment will be very short. This is the reason why the AI increases gradually in Exp 3 while it jumps quickly to a high level at the beginning of Exps 1 and 2. The perturbations in the deep layers of the model ocean will remain for a long time because the variance in the deep layers is not intensive. Although the perturbation at the beginning is relatively small in Exp 3, the growing mode of model errors would increase the ensemble spread at the later period of integration (Yin and Oey 2007) and thus improve the performance of Exp 3 in the last few months. After the ensemble inflation is applied in Exp 4, the AI is increased and kept the highest value during all the ODA period.
In order to clearly show the improvement of the ODA, the percentage error reduction in SST is given in Fig. 8c for Exp 1. Since AI presents a relative error reduction, the similar AI at different regions means different absolute error reduction. This distribution shows that SST error is reduced in most regions with a positive AI. However, the SST error in some regions is not reduced (AI equals to zero) or increased (negative AI), such as the equatorial region of the Atlantic Ocean.
These comparisons with satellite SST suggest that the perturbation should be introduced to all model layers, proper perturbation amplitude is important for ODA using EAKF, and the ensemble inflation by an optimal inflation factor can improve the performance of Argo data assimilation.
4 Summary and discussions
An EAKF module is designed for parallel computing by splitting the model domain into several blocks with overload computing regions. The associated EAKF system is arranged for separate computing with information exchanged through input/output files. The EAKF module is used in a global OGCM based on MOM4 to assimilate Argo profiles (both temperature and salinity) in 2008. Five experiments are carried out, which include the CTL that has no ODA, Exp 1 in which the perturbation of ICs is only performed for the ocean upper layers, Exps 2 and 3 by which the number of layers and the amplitude are tested for the perturbation of ICs, and Exp 4 that examines the ensemble inflation with an optimal inflation factor.
The comparisons of model results and GTSPP profiles (Figs. 4, 5, and 6) show that the temperature and salinity RMS errors are reduced in the layers shallower than 500 m after Argo data assimilation. In the layers deeper than 500 m, however, the results in Exp 1 become worse than CTL. This indicates that only perturbing upper layers of the ocean is not enough. Once all layers of the water column are perturbed in Exps 2–4, the temperature and salinity errors are systemically reduced comparing with CTL. The comparisons in vertical and temporal show that perturbation of all layers can improve the results not only in the deeper part but also in the upper part of the ocean. The perturbation amplitude (Exps 2 and 3) only causes a great difference in the beginning period of the ODA. The optimal ensemble inflation of 5% improves the performance of Argo data assimilation and gives the best result among all the experiments carried out in this study.
Further comparison with satellite SST is carried out to confirm the results from the comparison with the GTSPP profiles. The performance of Exp 1 in the second half of 2008 is, however, not as good as in the first half of the year. The experiments of different perturbation layers and amplitudes (Exps 1–3) indicate that perturbing all layers of the ocean is much better than only perturbing the upper ocean and that the perturbation amplitude is important for the beginning period of ODA. The results of Exp 4 with an optimal inflation factor of 5% can indeed improve the assimilation performance.
Only the sea temperature in OGCM is perturbed in the whole water column according to its variance; and then a finite temperature ensemble spread will be generated directly. Because of the existence of the relationship among model variables, the ensemble spreads of the unperturbed variables are also generated by the adjustment of the model itself through numerical integration. This perturbation method is easy to implement and the induced ensemble spread can keep well the dynamical balance inside the model. But comparing to the model uncertainties, the induced ensemble spread may be smaller in sometime. As the result, the EAKF analysis process becomes inaccurate and the ensemble inflation is necessary for a better assimilation performance. More efforts on improving the perturbation method should be attempted on the view of dynamics in the future.
Since high costs of computation time and storage are needed to determine the optimal inflation factor in this study, a more efficient way should be sought in the future. In addition, other methods for ensemble inflation, such as the adaptive covariance inflation error correction algorithm, should be tested further. We plan to include in this EAKF module many other kinds of observations, including satellite SST, sea-surface height from satellite altimeter, and other in situ temperature/salinity profiles, for more realistic applications.
The work was jointly supported by the Project of the National Basic Research Program of China under contract No. 2007CB816002 and a special fund for the Fundamental Scientific Research under contract No. 2008 G08.
This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.