LORA: a local ensemble transform Kalman filter-based ocean research analysis

We have produced an eddy-resolving local ensemble transform Kalman filter (LETKF)-based ocean research analysis (LORA) for the western North Pacific (WNP) and Maritime Continent (MC) regions (LORA-WNP and LORA-MC, respectively). This paper describes the system configuration and validation comparisons with Japan Coastal Ocean Predictability Experiment 2M (JCOPE2M) reanalysis and Archiving, Validation, and Interpretation of Satellite Oceanographic Data (AVISO) observational datasets. The results show that the surface horizontal velocity in the LORA-WNP is closer to independent drifter buoy observations in the mid-latitude region, especially along the Kuroshio Extension (KE), and is less close in the subtropical region than the JCOPE2M, although the AVISO is the closest over the whole domain. The sea surface temperatures (SSTs) in the LORA-WNP correspond better to assimilated satellite observations than the JCOPE2M over most of the domain except for coastal regions. The results using an independent buoy south of the KE indicate that better fit of temperature in the LORA-WNP may be limited to the upper 300 m depth, probably because of the prescribed vertical localization cutoff length of 370 m. In the MC region, the surface velocity in the LORA-MC is closer to the independent drifter buoys in the equatorial coastal region and is less close in the offshore region than the AVISO. The SSTs in the LORA-MC correspond better to the assimilated satellite observations in the offshore region than the nearshore region. Therefore, the LORA-WNP and LORA-MC have sufficient accuracy for geoscience research applications as well as for fisheries, marine transport, and environment consultants.


Introduction
Data assimilation optimally combines simulation and observations with statistical methods and dynamical systems theories, and it results in the statistically best estimates (i.e., analyses). With the development of numerical simulation and satellite and in situ observations, various operational centers and research institutions have produced atmosphere and ocean analysis datasets (e.g., Kalnay et al. 1996;Carton et al. 2018). Historical analysis datasets reconstructed using a single consistent data assimilation system are specifically referred to as reanalyses. These analysis and reanalysis datasets are utilized for a variety of geoscience studies, as they provide accurate gridded datasets without missing values.
There are various data assimilation methods suitable for high-dimensional ocean data assimilation systems, such as three-and four-dimensional variational assimilation (3D-and 4D-VAR, respectively), Kalman filters, ensemble optimal interpolation (EnOI), and ensemble Kalman filters (EnKFs). As summarized in review papers Martin et al. 2015), the 3D-VAR seems to be the most used among the currently available ocean reanalysis datasets. Global high-resolution reanalysis datasets, such as Ocean ReAnalysis System 5 (ORAS5; Zuo et al. 2019), Global Ocean Physics Reanalysis (GLORYS12; https:// data. marine. coper nicus. eu/ produ ct/ GLOBAL_ MULTI YEAR_ PHY_ 001_ 030/ descr iption), and Japan Coastal Ocean Predictability Experiments-Forecasting Global Ocean (JCOPE-FGO; Kido et al. 2022), have been recently produced by the 3D-VAR systems. Although the EnOI is applied to the global ocean and coupled data assimilation systems of Bluelink ReANalysis (BRAN; Chamberlain et al. 2021) and Goddard Earth Observing System Subseasonal to Seasonal prediction (GEOS-S2S) system version 2 (Molod et al. 2020), respectively, the background forecast matrix in the EnOI is static and differs from the EnKF. The EnKF is only adopted by a single reanalysis dataset referred to as TOPAZ4 (Sakov et al. 2012;Xie et al. 2017). TOPAZ4 is created using a regional ocean-sea ice data assimilation system constructed for the North Atlantic and Arctic regions with high horizontal resolution of 12-16 km, but the dataset only for the Arctic region is released through the Copernicus Marine Environment Monitoring Service website (CMEMS; https:// marine. coper nicus. eu/).
In the Pacific region, several high-resolution regional analysis datasets have been produced, for example, the Japan Fisheries Research and Education Agency-Regional Ocean Modelling System II (FRA-ROMS II; Kuroda et al. 2017) with 3D-VAR, the Japan Coastal Ocean Predictability Experiment 2M (JCOPE2M; Miyazawa et al. 2017) with multi-scale 3D-VAR (Li et al. 2015), the Fourdimensional variational Ocean Re-analysis for the Western North Pacific over 30 years (FORA-WNP30; Usui et al. 2017) with the 4D-VAR, and the Data assimilation Research of the East Asian Marine System (DREAMS; Hirose et al. 2013) with a Kalman filter with reducedorder approximation (Fukumori and Malanotte-Rizzoli 1995). To the best of the authors' knowledge, there are no EnKF-based high-resolution analysis datasets available for the Pacific region.
We have developed an eddy-permitting EnKF-based regional ocean data assimilation system for the western North Pacific (WNP) region (Ohishi et al. 2022a;Ohishi et al. 2022b), in which satellite and in situ observations are assimilated at a shorter interval (1 day) than 5 days used for the TOPAZ4. Sensitivity experiments have demonstrated that a combination of relaxation-to-prior perturbation (RTPP; Zhang et al. 2004;Kotsuki et al. 2017), incremental analysis update (IAU; Bloom et al. 1996), and adaptive observation error inflation (AOEI; Minamide and Zhang 2017) are required to assure sufficient dynamical balance and accuracy. With the recent enhancement of high-performance computing resources, we are now able to integrate EnKFbased ocean data assimilation systems with sufficiently high horizontal resolution to resolve fronts and eddies. This study constructs two eddy-resolving analysis datasets for the WNP and Maritime Continent (MC) regions (Fig. 1a), where an infrared sensor on the geostationary Himawari-8 satellite has observed sea surface temperatures (SSTs) used for assimilation since July 2015 (Bessho et al. 2016;Kurihara et al. 2016).
In the WNP region, numerous fronts and eddies are distributed around western boundary currents and their extension regions (e.g., Kida et al. 2015). It has long been recognized that the mid-latitude atmosphere forces the ocean field through surface turbulent heat fluxes (Cayan 1992), entrainment at the base of the mixed layer (Miller et al. 1994), and Ekman transport (Yasuda and Hanawa 1997). With the progress of satellite observations and numerical simulation, the impact of ocean fronts and eddies on atmospheric stability and pressure has been highlighted (e.g., Nonaka and Xie 2003;Minobe et al. 2008;Nakamura et al. 2008), and thus mid-latitude air-sea interaction has attracted much attention. Studies on coupled atmosphere-ocean data assimilation that consider the exchange and covariance of uncertainty between the atmosphere and ocean fields have recently been performed (e.g., Sluka et al. 2016;Penny et al. 2019). Ensemble ocean analysis datasets would be useful not only for ocean research but also for atmospheric research to assess the impacts of the uncertainty of the oceanic field on the atmosphere.
There are few in situ observations in the MC region, especially around the shallow equatorial coastal area (Gulf of Thailand, Java Sea, Celebes Sea, Banda Sea, Arafura Sea, and Timor Sea) compared with the WNP region (Fig. 1b, c). The Intergovernmental Oceanographic Commission (IOC) Sub-Commission for the western Pacific (WESTPAC) has supported sustained observations and services around the Indo-Pacific region over long periods. An international project "CoastPredict" (https:// www. coast predi ct. org/) has recently been endorsed by the United Nations Educational, Scientific and Cultural Organization (UNESCO)/IOC as a program under the United Nations Decade of Ocean Science for Sustainable Development, and one of its objectives is the co-design and implementation of an integrated coastal ocean observing and forecasting system. Therefore, it is expected that the coastal observation networks around the MC region will be enhanced in the next few decades. The WESTPAC has already published an ocean forecasting system (https:// iocwe stpac. org/ ofs/ 897. html), but details including the model configuration, data assimilation, and validation have not yet been reported. In the South China Sea (SCS), cyclonic (anticyclonic) ocean circulation develops during the boreal winter northeasterly (summer southwesterly) monsoon, and thus the atmospheric and oceanic fields undergo remarkable seasonal variations (e.g., Xie et al. 2003;Thompson et al. 2016). Consequently, accurate analysis datasets are required for fisheries, shipping companies, and environmental consultants as well as physical and biogeochemical oceanographers.
This study aims to create new regional ocean research analysis datasets, the local ensemble transform Kalman filter (LETKF)-based ocean research analysis (LORA), in the WNP and MC regions, and to provide detailed descriptions including system setting and validation comparison. Configurations of EnKF-based regional ocean data assimilation systems are described in Sect. 2, and the methods used to evaluate and compare validation of the LORA and existing datasets are provided in Sect. 3. The climatological ensemble mean and spread as well as the results of validation comparison of the LORA in the WNP are shown in Sect. 4, and those for the MC region are given in Sect. 5. A summary is presented in the final section.

LETKF-based ocean data assimilation system
This study uses an EnKF-based regional ocean data assimilation system, sbPOM-LETKF (Ohishi et al. 2022a;Ohishi et al. 2022b), which consists of the Stony Brook Parallel Ocean Model (sbPOM; Jordi and Wang 2012) and the local ensemble transform Kalman filter (LETKF; Hunt et al. 2007;Miyoshi and Yamane 2007). Overviews of the models and data assimilation techniques used are summarized in Tables 1 and 2, Kobayashi et al. (2015), *5 Kalnay et al. (1996) (Mellor et al. 1994). Consequently, the topography is smoothed by applying Gaussian filters with 120 km and 105 km e-folding scale in the WNP and MC regions, respectively, to fulfill the condition where H i and H i+1 are bottom depths at adjacent grid cells. We use monthly temperature and salinity climatologies from the World Ocean Atlas 2018 (WOA18; Locarnini et al. 2019;Zweng et al. 2019) for the initial conditions over depths shallower than 1500 m and seasonal climatologies below 1500 m. A global ocean reanalysis dataset derived from the Simple Ocean Data Assimilation (SODA) version 3.7.2 (Carton et al. 2018) with horizontal resolution of 0.5° and 50 layers is adopted for the lateral boundary conditions of temperature, salinity, horizontal velocity, and sea surface height (SSH). The Japan 55-year Reanalysis (JRA-55; Kobayashi et al. 2015) with horizontal and temporal resolution of 1.25° and 6 h, respectively, is used for the atmospheric boundary conditions including air temperature and specific humidity at 2 m, wind velocity at 10 m, shortwave radiation, total cloud fraction, sea level pressure, and precipitation. River discharge is obtained from the Japan Aerospace Exploration Agency (JAXA)'s land surface and river simulation system, Today's Earth (TE)-Global (https:// www. eorc. jaxa. jp/ water/), with horizontal and temporal resolution of 0.25° and 3 h, respectively. To prevent filter divergence, the atmospheric and lateral boundary conditions except for rainfall and river discharge are perturbed (see Appendix 1; Torn et al. 2006;Kunii and Miyoshi 2012). The model is driven by wind stresses and heat and freshwater fluxes using bulk formulae, in which bulk coefficients are estimated by the Coupled Ocean-Atmosphere Response Experiment (COARE) version 3.5 bulk algorithm (Edson et al. 2013;Brodeau et al. 2017). The horizontal diffusivity coefficient is calculated by a Smagorinsky-type formulation with a coefficient of 0.1 (Smagorinsky et al. 1965) and is assumed to be one-fifth of the horizontal viscosity coefficient. The vertical diffusivity coefficient is estimated by the Level 2.5 version of Nakanishi and Niino (2009). The model with 100 ensemble members is spun up from an initial state of no motion from 1 January 2011 to 6 July 2015 using the perturbed atmospheric and lateral boundary conditions. During the spin-up period, simulated temperature and salinity are nudged toward the WOA18 monthly climatology with a 90-day timescale to prevent northward Kuroshio overshoot along the east coast of Japan.  Hunt et al. (2007) and Miyoshi and Yamane (2007)

3
The 3D-LETKF with 100 ensemble members is used to assimilate the following observations on a 1-day cycle: satellite SSTs from the Himawari-8 (Bessho et al. 2016;Kurihara et al. 2016Kurihara et al. , 2021 and Global Change Observation Mission-Water (GCOM-W; Shibata 2007; https:// gport al. jaxa. jp/ gpr/? lang= en), satellite sea surface salinity (SSS) from Soil Moisture and Ocean Salinity (SMOS; https:// www. esa. int/ Appli catio ns/ Obser ving_ the_ Earth/ SMOS) and Soil Moisture Active Passive (SMAP) version 4.3 (Meissner et al. 2018), satellite SSH anomalies from the CMEMS (https:// marine. coper nicus. eu/), and in situ temperatures and salinity from the Global Temperature and Salinity Profile Programme (GTSPP; Sun et al. 2010) and Advance automatic QC (AQC) Argo Data version 1.2a (https:// www. jamst ec. go. jp/ argo_ resea rch/ datas et/ aqc/ index_ datas et. html). Here, the mean dynamic ocean topography is assumed to be the simulated SSH averaged over 2012-14. We exclude satellite SSS within 100 km of the coasts, SSH for bottom topography shallower than 200 m, in situ temperatures and salinity duplicated between the GTSPP and AQC Argo, and observations without the best quality flags. To reduce the computational costs, we average satellite SST and SSS data at each model grid cell and then thin them at two grid intervals in longitude and latitude directions. The numbers of assimilated observations per day in the WNP and MC regions are approximately 1.5 × 10 5 and 0.5 × 10 5 , respectively.
Following Ohishi et al. (2022a, b), the system adopts (a) IAU (see Appendix 2; Bloom et al. 1996), (b) RTPP (see Appendix 3; Zhang et al. 2004;Kotsuki et al. 2017) to relax the forecast ensemble perturbations toward the analysis ensemble perturbations by 90% while maintaining the analysis ensemble mean, and (c) AOEI (see Appendix 4; Zhang et al. 2016;Minamide and Zhang 2017), in which the prescribed temperature, salinity, and SSH observational errors are set to 1.0 °C, 0.3, and 0.2 m, respectively. Covariance localization is applied using the Gaussian function with horizontal (vertical) localization scale LS of 300 km (100 m) following Miyazawa et al. (2012) and Penny et al. (2013), and thus the horizontal (vertical) localization function becomes zero beyond 2 √ 10∕3LS ≈ 1100 km (370 m). Using Fugaku and JAXA Supercomputer System Generation 3 (JSS3) consisting of the Fujitsu PRIMEHPC FX1000, we integrate the system from 7 July 2015 at the start date of the Himawari-8 observations to 31 December 2019, and are currently extending the period. Following Usui et al. (2017), the SSS (SST and temperature and salinity in the ocean interior) is nudged toward the monthly climatology from the WOA18 with 3-month (1.5-year) time scale during the assimilation period. If we use 100 nodes and hybrid Message Passing Interface (MPI)/Open Multi-Processing (OpenMP) parallelization, about 20-min computation time is required for one assimilation cycle. Daily mean 3D-outputs of the ensemble mean and spread and daily mean 2D-outputs at the sea surface of all ensemble members are saved. The volume for both 3D-ensemble mean and spread in the WNP (MC) region is 45 (34) gigabyte (GB) month −1 , if we save basic variables (temperature, salinity, and velocities, etc.) and all terms in heat and salinity budget equations (advection, diffusion, surface fluxes, and analysis increments, etc.). The volume of 2D-all ensemble members at the sea surface is 84 (60) GB month −1 in the WNP (MC) region. As described in Sect. 1, the analysis datasets constructed by the sbPOM-LETKF system are called LETKF-based Ocean Research Analysis (LORA), and hereafter the LORA in the WNP and MC regions are referred to as LORA-WNP and LORA-MC, respectively.

Validation
This study compares the validation of the LORA-WNP and -MC with regional ocean reanalysis and global observation datasets: the JCOPE2M (Miyazawa et al. 2017) and Archiving, Validation, and Interpretation of Satellite Oceanographic Data (AVISO; Ducet et al. 2000;Pujol et al. 2016;Taburet et al. 2019), respectively. An overview of the JCOPE2M is summarized in Tables 1 and 2 (see Miyazawa et al. (2017) for more details). The main differences between the JCOPE2M and LORA-WNP are (a) the JCOPE2M uses multi-scale 3D-VAR data assimilation while LORA-WNP uses LETKF; (b) neither river discharge nor satellite SSS assimilation is included in the JCOPE2M; and (c) there is no biologging temperature assimilation in the LORA-WNP. However, there are many similarities between the JCOPE2M and LORA-WNP; i.e., the sigma-coordinate regional ocean models, eddy-resolving horizontal resolution, number of sigma layers, system domains, and short assimilation intervals. The different models between the LORA and JCOPE2M are not likely to make large impacts on the forecast accuracy, and consequently the JCOPE2M is suitable for comparing validation results from the LORA-WNP. The JCOPE2M has been adopted for physical and biogeochemical oceanography studies (Kodaira and Waseda 2019;Hasegawa et al. 2021;Ishizu et al. 2021) and has provided operational information on Kuroshio and Oyashio prediction for the Kuroshio-Oyashio Watch website (https:// www. jamst ec. go. jp/ aplin fo/ kowat ch/e/). The AVISO is a widely used global observation-based SSH dataset with horizontal resolution of 0.25°. It is created by combining optimal interpolated satellite SSH anomalies and mean dynamics ocean topography estimated from a geoid model, satellite altimetry, and in situ data.
The root mean square deviations (RMSDs), biases, and anomaly correlation coefficients (ACCs) of the LORA-WNP and -MC, JCOPE2M, and AVISO are calculated relative to the following in situ independent observations in 2016-2018: surface horizontal velocity from drifter buoys of the Global Drifter Program (Elipot et al. 2016 , and CMEMS (http:// www. marin einsi tu. eu/ dashb oard/). We note that the drifter buoy observations may not be independent of the AVISO, as they are used for estimating the mean dynamic ocean topography in the AVISO.
Since there are few available independent SST data in the offshore regions, the RMSDs relative to the assimilated Himawari-8 SSTs are also calculated. The anomalies used to calculate ACCs are high-passed values from a 31-day  1), (2), and (3) mean that the LORA more (less) corresponds to observations than the JCOPE2M/AVISO in terms of the RMSDs, absolute biases, and ACC, respectively. Significant RMSD and absolute bias differences of the LORA with the JCOPE2M and AVISO are detected at a 99% confidence level using a bootstrap method with 10,000 cycles.

Climatological ensemble mean and spread
As Ohishi et al. (2022a, b) demonstrated that an EnKFbased ocean data assimilation system with a frequent assimilation interval might produce unrealistic fields if suitable settings are not applied, we first confirm the ensemble mean and spread in temperature, salinity, and horizontal flow at the sea surface and zonal (meridional) section along (across) the KE averaged over 2016-2018 (Figs. 2, 3, and4). Figure 2 shows that the LORA-WNP reproduces strong SST and SSS fronts along the quasistationary J1 jet [145°-155°E, 40°N] (Isoguchi et al. 2006) and strong currents along the Kuroshio and KE, as consistent with previous studies (cf. The temperature and salinity ensemble means in the ocean interior in the LORA-WNP represent the following typical water masses corresponding to the observations (Fig. 3): subtropical mode water (STMW; e.g., Masuzawa 1969;Suga and Hanawa 1995) characterized by a vertical stratification minimum around 25°-30°N and 200-400 m depth; high-salinity North Pacific tropical water (NPTW; e.g., Cannon 1966;Katsura et al. 2013) over 0-200 m depth south of 25°N; and low-salinity North Pacific intermediate water (NPIW; e.g., Talley 1993;Yasuda 1997) around 20°-35°N and 500-1000 m depth. Figure 4a, b shows that the strong eastward KE extends to ~ 1000 m depth, as consistent with mooring observations (cf. Figures 6 and 8 of Waterman et al. 2011). As reported by historical ship observations along the 137°E section (cf. Figure 4 of Oka et al. 2018), there are weak westward flows south of the KE and south of 25°N in association with the Kuroshio recirculation gyre and the North Equatorial Current, respectively (Fig. 4b). The large temperature spread is distributed over 200-500 m depth around the main thermocline along the KE and 0-200 m depth along the J1 jet (Fig. 3b). The salinity spread is large from the surface to the sub-surface layer along the KE and J1 jet (Fig. 3d). The large ensemble spread in the horizontal flow along the KE reaches ~ 1000 m depth (Fig. 4c, d, g, h).
Around the KE region, large variations associated with the meanders and eddies lead to the large ensemble spread in the horizontal flow. According to thermal wind theory, the sharp meridional temperature slope in the main thermocline is related to the strong zonal flow. Consequently, the large temperature spread in the main thermocline around the KE region is theoretically consistent with the large spread in the zonal flow. As the J1 jet position is quasi-stationary due to topographic effects (Isoguchi et al. 2006;Mitsudera et al. 2018), the ensemble spread in the horizontal flow along the J1 jet is not as large as that for the KE. A huge amount of turbulent heat is released around the SST fronts because of strong wind speed and large air-sea humidity and temperature differences (e.g., Ohishi et al. 2016Ohishi et al. , 2019aTozuka et al. 2017); therefore, the perturbed atmospheric forcing may result in the large temperature and salinity spread around the J1 jet. It is beyond the scope of this study to quantitatively investigate the causes of the formation of the large ensemble spread.
Therefore, the LORA-WNP appears to reproduce the climatological temperature, salinity, and horizontal flow fields. Around the KE region, the large ensemble spread in the temperature, salinity, and horizontal flow is probably caused by the large variations associated with the abundance of the meanders and eddies. Around the J1 jet region, the ensemble spread in temperature and salinity is large, whereas that in horizontal flow is relatively small, probably because the J1 jet position is stable due to the topographic effect.

Validation in the offshore region
The RMSDs of the LORA-WNP, JCOPE2M, and AVISO relative to surface zonal and meridional velocities from the drifter buoys are calculated for 5° longitude × 5° ). The RMSD ratios of the LORA-WNP relative to JCOPE2M indicate that the LORA-WNP more closely corresponds to the drifter buoys than the JCOPE2M in the mid-latitude region, especially along the KE (Figs. 5c and 6c). The LORA-WNP is slightly closer to  c The RMSD ratios of the LORA-WNP relative to the JCOPE2M. In a, the red dot with a white border denotes Katsura Island with the largest RMSDs among all stations. In c, circles and triangles indicate significant and no significant RMSD differences, respectively, at a 99% confidence level The statistical analyses indicate that the LORA-WNP is significantly closer to the surface zonal (meridional) velocity from the drifter buoys than the JCOPE2M by 3.8% (7.5%) and is significantly less close than the AVISO by 12% (17%).
Since there are no independent in situ SST data providing full coverage of the offshore region, the RMSDs are calculated relative to the Himawari-8 SSTs assimilated in both LORA-WNP and JCOPE2M (Fig. 7). The larger RMSDs are distributed at higher latitude in both datasets (Fig. 7a, b). This is consistent with Kurihara et al. (2016), who indicated that the Himawari-8 accuracy is lower at higher latitude, where the satellite zenith angle is larger. The LORA-WNP is closer to Himawari-8 SSTs than JCOPE2M in broad domains away from coasts, especially in the mid-latitude region, although the JCOPE2M corresponds better in the coastal region, eastern Japan Sea, and East China Sea (Fig. 7c). The RMSDs averaged over the whole analysis period and domain are 0.86 and 0.95 °C in the LORA-WNP and JCOPE2M, respectively, and the RMSDs of the LORA-WNP are significantly smaller than that of the JCOPE2M by 9.5%.
To validate the ocean-interior of the LORA-WNP and JCOPE2M, the RMSDs relative to independent observations by the KEO buoy south of the KE are calculated (cf. Figure 2a). The temperature RMSDs of both datasets increase at deeper depths (Fig. 8a). In 0-300 m depth the temperature of the LORA-WNP is significantly closer to the KEO buoy than that of the JCOPE2M, whereas it is less close below 300 m depth; however, the difference is not significant. This is likely because the prescribed vertical localization scale of 100 m in the LORA-WNP confines the impacts of assimilation of surface observations within the upper 0-370 m depth. Although this is a result from a single observation point, the better temperature fit in the LORA-WNP may be limited to the upper 300 m depth over the whole domain. In both datasets, the salinity RMSDs are smaller at 100-300 m depth and larger near the surface and below 300 m depth (Fig. 8b). Over the whole depth, the LORA-WNP shows larger salinity RMSDs than the JCOPE2M, but the difference is not significant. At most of the observation points, the RMSDs in the zonal and meridional velocities indicate that the LORA-WNP significantly corresponds better to the KEO buoy than the JCOPE2M (Fig. 8c, d). The results regarding the RMSDs of the SSTs and surface horizontal velocity at the KEO buoy are consistent with those from the Himawari-8 and drifter buoys, respectively (Figs. 5, 6, and 7). Figure 9 shows the RMSDs of the LORA-WNP and JCOPE2M relative to independent coastal SSTs over the analysis period. The RMSDs of the LORA-WNP exceed 1 °C at almost all stations, and they are especially large in northern Japan (Fig. 9a). Although the JCOPE2M also shows relatively large RMSDs of 1-2 °C (Fig. 9b), the RMSDs in the JCOPE2M are smaller than the LORA-WNP over almost all stations (Fig. 9c). To gain insight into the causes of the large RMSDs in both datasets, the SST time series at Katsura Island on the northeast coast of Japan (white circle in Fig. 9a), where the RMSD is the largest among all stations, are shown in Fig. 10. The LORA-WNP and JCOPE2M have high SST biases in the winter season, and the biases are higher in the LORA-WNP than in the JCOPE2M. As shown in Fig. 11, for most of the stations, both datasets have high SST biases over 0.5 °C (Fig. 11a, b), and the biases are higher in the LORA-WNP than the JCOPE2M (Fig. 11c). The spatial pattern of the absolute bias ratios is similar to the RMSD To remove the contributions from the biases, the ACCs are calculated using SSTs high-passed by a 31-day moving filter (Fig. 12). At all stations, the ACCs are positive in both datasets (Fig. 12a, b). The ACC ratios of the LORA-WNP relative to the JCOPE2M indicate that the LORA-WNP has higher correlations with in situ SST data in most of the stations than the JCOPE2M (Fig. 12c). Therefore, in the coastal region, the RMSDs in the LORA-WNP are larger than in the JCOPE2M because of the higher SST biases, but the sub-monthly temporal variations are better reproduced in the LORA-WNP.

Validation in the nearshore region
Although we adopt the JRA-55 for the atmospheric forcing in the LORA, the horizontal resolution of 1.25° might not be sufficient to resolve coastlines. As a result, it seems that the fine coastline in the data assimilation system is not well captured by the coarse land-sea mask in the JRA-55, and thus weak land-surface wind is assigned to nearshore system grid cells. Sensitivity experiments on horizontal resolution for atmospheric reanalysis datasets are currently underway. The AOEI might also contribute to the high SST biases in the LORA-WNP, as it reduces the analysis increments by inflating the observation errors where the innovation is large. However, as the AOEI substantially improves the low-salinity NPIW structure around the KE region (Ohishi et al. 2022b), it is better to apply the AOEI.
The RMSDs of the LORA-WNP, JCOPE2M, and AVISO relative to the SSH anomalies from the tide gauges over the analysis period are calculated, although there may be room for improvement in the coastal atmospheric forcing (Fig. 13). The RMSD ratios of the LORA-WNP relative to the JCOPE2M indicate significantly lower RMSDs in the LORA-WNP in northern Japan but significantly larger RMSDs in the other regions than in the JCOPE2M  (Fig. 13c). The RMSDs of the AVISO are significantly smaller than that of the LORA-WNP over almost all stations (Fig. 13e). These findings are roughly consistent with the results from the drifter buoys.

Winter and summer climatological ensemble mean and spread
Southeast Asia undergoes remarkable seasonal variations of both atmosphere and ocean (Fig. 14). In the boreal winter with the northeasterly monsoon, a cyclonic circulation develops in the SCS (Fig. 14b). The cold tongue along the eastern coast of Vietnam is induced by southward cold advection and turbulent heat release ( Fig. 14a; Thompson et al. 2016;Seow and Tozuka 2019). In contrast, the southeasterly monsoon dominates in boreal summer, and the SCS is characterized by an anticyclonic circulation (Fig. 14d). Near the east coast of Vietnam, the coastal upwelling caused by the southeasterly wind results in localized cool SST ( Fig. 14c; Xie et al. 2003;Dippner et al. 2007). Figures 15, 16, and 17 show the ensemble mean and spread of the SST, SSH, and SSS in the LORA-MC in February and August averaged over 2016-2018. In the SCS, the LORA-MC reproduces the cold tongue associated with the cyclonic circulation in boreal winter and the localized cool SST and anticyclonic circulation in boreal summer, as consistent with the observations (Figs. 15a, c, and 16a, c). In the summer rainy season with the large river discharge from the Mekong River, the salinity near the east coast of Vietnam is lower than in winter (Fig. 17a, c).
The SST ensemble spread is large around the Kuroshio region in February and relatively small in other regions (Fig. 15b). In August, the relatively large SST spread near the east coast of Vietnam may be related to the sensitivity of wind-induced cool SST to the perturbed atmospheric forcing (Fig. 15d). Large SSH spread is found around the Kuroshio region only in August when the Kuroshio transport is seasonally large because of the joint effect of baroclinicity and bottom relief (JEBAR; Sakamoto and Yamagata 1996; Fig. 16b, d). The SSS ensemble spread is large around the Kuroshio region in both February and August (Fig. 17b, d). The position of the salinity spread maximum corresponds to the large SST and SSH spread in winter and summer, respectively, and thus the formation mechanisms of the large salinity spread may be different. As in the LORA-WNP system, it is beyond the scope of the present study to quantitatively investigate the processes responsible for the large ensemble spread.

Validation of the LORA-MC
As described in Sect. 1, there are few in situ and independent observations in the MC region (cf. Figures 1 and   18). For example, there are more than 10 drifter buoy observations per month per bin in the offshore region, but only a few observations are available around the coastal equatorial region (Fig. 18). In this subsection, the RMSDs and biases of the LORA-MC are calculated relative to the independent observations by the drifter buoys and tide gauges and the assimilated Himawari-8 SSTs (Figs. 19, 20, and 21).
As in Sect. 4.2, the RMSDs of the LORA-MC and AVISO relative to the drifter buoys are calculated for the surface zonal and meridional velocity fields in 5° longitude × 5° latitude bins (Fig. 19). In the offshore region, the RMSDs in both zonal and meridional velocities are relatively small in the LORA-MC and AVISO. There are large RMSDs over 0.3 m s −1 in the LORA-MC around the Kuroshio region and parts of the coastal equatorial region (near the eastern coast of Kalimantan Island and the western boundary of the system; Fig. 19a, b). In the AVISO, large RMSDs over 0.3 m s −1 are distributed more broadly around the coastal equatorial region than in the LORA-MC (Fig. 19c, d). The RMSD ratios of both zonal and meridional velocities of the LORA-MC relative to the AVISO indicate that the LORA-MC is closer to the drifter buoys than the AVISO around the equatorial coastal region and is less close in most of the offshore region (Fig. 19e, f). The larger RMSDs of the AVISO in the equatorial coastal region may be because the small number of drifter observations results in less accurate mean dynamical ocean topography. For the surface zonal and meridional velocities, the spatiotemporally averaged RMSDs of the LORA-MC are 0.275 and 0.265 m s −1 for the LORA-MC, respectively, and are significantly and larger than that of the AVISO of 0.254 and 0.223 m s −1 by 8.3 and 18.8%.
The RMSDs and biases relative to the assimilated Himawari-8 SSTs are calculated over the analysis period (Fig. 20). There are small (large) RMSDs in the offshore (nearshore)  (Fig. 20a), and the RMSD averaged over the whole analysis period and domain is 0.62 °C. As in the LORA-WNP, the high SST biases relative to the Himawari-8 are distributed near the coastal regions in the LORA-MC (Fig. 20b), and thus they result in the large RMSDs. As discussed in Sect. 4.3, more accurate atmospheric forcing around nearshore regions would be required to improve the accuracy in the LORA-MC.
The RMSDs of the LORA-MC and AVISO over the analysis period are calculated relative to the SSH anomalies from tide gauges (Fig. 21). The LORA-MC and AVISO have smaller RMSDs around the equatorial coastal region than in the Kuroshio region (Fig. 21a, b). The RMSD ratios of the LORA-MC relative to the AVISO indicate that the AVISO significantly better fits to the tide gauge observations than the LORA-MC at almost all stations (Fig. 21c). Here, the LORA-MC and AVISO are positively correlated with the observations at all stations, although the AVISO has higher correlation than the LORA-MC at almost all stations (figure not shown). Consequently, the LORA-MC reproduces temporal variations in the SSH to some extent, but the AVISO corresponds better than the LORA-MC at almost all stations.

Summary
We have constructed LETKF-based high-resolution regional ocean analysis datasets named LORA (LETKF-based Ocean Research Analysis) in the WNP and MC regions, i.e., LORA-WNP and LORA-MC, respectively. The validation results of the LORA, JCOPE2M, and AVISO were compared by calculating the RMSDs, biases, and ACCs relative to the independent observations (the surface drifter buoys, KEO buoy, in-situ coastal observations, and tide gauges) and assimilated Himawari-8 SSTs.
The LORA-WNP appears to reproduce well the climatological temperature, salinity, horizontal flow, and SSH fields. It has the large ensemble spread in the SST, SSS, SSH, and surface horizontal flow fields around the KE region, whereas the ensemble spread is large only in the SST and SSS fields along the J1 jet (Figs. 2, 3, and 4). The stationary position of the J1 jet caused by topographic effects probably results in the relatively small ensemble spread in the horizontal flow and SSH.
The RMSDs of the LORA-WNP relative to the horizontal flow measured by the surface drifter buoys and the KEO buoy indicate that the LORA-WNP is less close to the independent observations than the JCOPE2M in the subtropical region but is closer in the mid-latitude regions, especially along the KE, although the AVISO is the closest of the three datasets (Figs. 5,6,and 8c,d). The Himawari-8, KEO buoy, and coastal SST observations show that the SST of the LORA-WNP broadly better fits to the observations than that of the JCOPE2M in the subtropical and mid-latitude regions except for the coastal regions, eastern part of the Japan Sea, and East China Sea (Fig. 7). The smaller temperature RMSDs of the LORA-WNP than of the JCOPE2M may be limited to the depth shallower than 300 m over the whole domain, likely because of the prescribed vertical localization scale of 100 m (i.e., cutoff length scale of 370 m) in the LORA-WNP, although this is a result from a single independent observation from the KEO buoy south of the KE (Fig. 8a). In the coastal regions, the higher SST biases in the LORA-WNP than in the JCOPE2M result in larger RMSDs, but the sub-monthly temporal variations are better reproduced in the LORA-WNP than the JCOPE2M (Figs. 9,10,11,and 12). The salinity RMSDs are larger in the LORA-WNP than the JCOPE2M over the whole depth at the KEO buoy, although the difference is not significant (Fig. 8b).
The LORA-MC also represents realistic climatological fields: the cold tongue along the east coast of Vietnam associated with the cyclonic circulation in boreal winter, and wind-induced localized cool SST near the east coast of Vietnam and anticyclonic circulation in the SCS in boreal summer (Figs. 15a,c,16a,c,and 17a,c). Around the Kuroshio region, there is large ensemble spread in the SST (surface horizontal flow) in winter (summer), and the SSS spread is large in both seasons (Figs. 15b,d,16b,d,and 17b,d). In the rest of the MC region, the ensemble spread in SST, SSS, and surface horizontal flow fields is relatively small.
The RMSDs of the LORA-MC relative to the drifter buoys are smaller than those of the AVISO around the equatorial coastal region, probably because of less accurate mean dynamical ocean topography in the AVISO caused by the small number of drifter buoy observations, whereas they are larger in the offshore region (Figs. 18 and 19). The results using the Himawari-8 SST show large RMSDs around the nearshore region because of high SST biases, as in the LORA-WNP, and small RMSDs in the offshore region (Fig. 20). The RMSDs relative to the tide gauges indicate better fit of the SSH in the AVISO than the LORA-MC at almost all stations. In summary, the LORA-WNP and LORA-MC have sufficient accuracy to be used in a variety of geoscience research areas, as well as for fisheries and marine transport applications and marine environment consultants.
Although various institutions are now developing EnKFbased ocean data assimilation systems, no high-resolution EnKF-based ocean data assimilation products are available in the Pacific region to the best of the authors' knowledge. Multiple global high-resolution reanalysis datasets are currently available as described in Sect. 1, whereas the LORA has limited domains in the WNP and MC regions. This allows higher resolution analysis in the LORA. Moreover, the LORA-WNP and -MC have unique capability to visualize the spatiotemporal changes of the uncertainty represented by the ensemble spread. Since the LORA includes all terms in the heat (salinity) budget equation including the advection, diffusion, surface heat (freshwater) fluxes, and analysis increments, the LORA can be used to quantitatively investigate the causes of spatiotemporal temperature and salinity variations. The LORA-WNP and -MC would be useful for geosciences research in the mid-latitude and equatorial regions, respectively, except for nearshore areas, where the accuracy of the LORA-WNP and -MC seems to be not as reliable. For further improvement, our future study will investigate the impact of Fig. 21 RMSDs of the a LORA-MC and b AVISO relative to SSH anomalies from tide gauges. c The RMSD ratios of the LORA-MC relative to the AVISO. In c, circles and triangles indicate significant and no significant RMSD differences, respectively, at a 99% confidence level using high-resolution atmospheric forcing because nearshore weak wind speed may be an artifact of low-resolution atmospheric forcing and the reason for the nearshore high SST biases in the LORA. Furthermore, we will investigate the predictability for the Kuroshio path shift to a large meander path in summer 2017 (Sugimoto et al. 2020;Sakajo et al. 2022) by conducting ensemble prediction experiments. Although the system still includes tuning parameters for ensemble size, localization scales, observation errors, and uncertainty in the atmospheric forcing, which might improve the accuracy, it is beyond the scope of the present study to survey them. We are now planning to extend the analysis period and to construct a near-real-time ensemble forecast system. We will publish the daily-mean sea-surface analyses for all 100 ensemble members as well as 3D-ensemble mean and spread in the LORA-WNP and LORA-MC on a web site in 2023.

Appendix 1. Perturbed atmospheric and lateral boundary conditions
To avoid the filter divergence in which the ensemble spread becomes exceedingly small, the atmospheric and lateral boundary conditions in the system are perturbed following Kunii and Miyoshi (2012) and Torn et al. (2006), respectively. The atmospheric forcing of the i th ensemble member at a time t , w (i) (t) , is represented as where w(t) is atmospheric forcing at time t , (= 0.2) is an arbitrary constant, w t + t i is atmospheric forcing at the same time as w(t) but in a different year, and n (= 100) is the ensemble size. Here, the year in w t + t i is changed every month, and the ensemble mean of w (i) (t) is equal to w(t).
The lateral boundary conditions are perturbed using the monthly mean global reanalysis dataset from the SODA version 3.7.2 for different years. This corresponds to the ensemble mean of the perturbed lateral boundary condition being equivalent to the monthly climatology.

Appendix 2. Incremental analysis update (IAU)
The IAU procedure for one assimilation cycle is as follows: (i) conduct the model integration up to the middle of the assimilation window; (ii) assimilate observations composited within the window and save the analysis increments in temperature, salinity, and horizontal velocity; and (iii) conduct the integration over the window, adding the increments equally distributed to each timestep. The IAU method can reduce noise from high-frequency gravity waves associated with the initial shock, but the computational cost for the model integration becomes 1.5 times higher than the standard method where the analyses performed at the beginning of the window are used for the initial conditions.

Appendix 3. Relaxation-to-prior perturbation (RTPP)
The RTPP relaxes the analysis ensemble perturbations toward the forecast ensemble perturbations, maintaining the analysis ensemble mean, as represented by is the ensemble perturbation matrix whose the i th column consists of the perturbations of the i th ensemble member, where x (i) and x are the state vectors of the i th ensemble member and ensemble mean, the superscripts a and b denote analysis and forecast, the subscripts inf and orig indicate inflated and original (i.e., before inflation), respectively, and RTPP is the relaxation parameter. This study adopts the combination of the IAU and RTPP with 90% relaxation (i.e., RTPP = 0.9 ), as Ohishi et al. (2022a) demonstrate that this improves both balance and accuracy for sensitivity experiments on the IAU and various covariance inflation methods.

Appendix 4. Adaptive observation error inflation (AOEI)
The AOEI suppresses erroneous analysis increments associated with systematic errors, biases, and representation errors by adaptively inflating the observation errors (Zhang et al. 2016;Minamide and Zhang 2017). According to innovation statistics from Desroziers et al. (2005), Here, ⟨•⟩ indicates the statistical expectation, and d o b (= y − x b ) is an innovation vector, where y , , and x b denote an observation vector, linear observation operator, and forecast ensemble mean vector, respectively. and are the forecast and observation error covariance matrices, respectively. Equation (6) in scalar form might be able to estimate the observation error est−o as follows: where H(x b ) is the forecast ensemble spread in observation space. The AOEI method chooses larger observation 2 errors o by comparing the estimated and prescribed errors as below: where pre−o is the prescribed observation error. Ohishi et al. (2022b) demonstrate that the AOEI substantially improves the low-salinity NPIW (Talley 1993;Yasuda 1997) around the KE region, where both ensemble spread and innovation are large, as well as the geostrophic balance and accuracy in an EnKF-based ocean data assimilation system.

Conflict of interest
The authors declare no competing interests.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.