1 Introduction

Data assimilation optimally combines simulation and observations with statistical methods and dynamical systems theories, and it results in the statistically best estimates (i.e., analyses). With the development of numerical simulation and satellite and in situ observations, various operational centers and research institutions have produced atmosphere and ocean analysis datasets (e.g., Kalnay et al. 1996; Carton et al. 2018). Historical analysis datasets reconstructed using a single consistent data assimilation system are specifically referred to as reanalyses. These analysis and reanalysis datasets are utilized for a variety of geoscience studies, as they provide accurate gridded datasets without missing values.

There are various data assimilation methods suitable for high-dimensional ocean data assimilation systems, such as three- and four-dimensional variational assimilation (3D- and 4D-VAR, respectively), Kalman filters, ensemble optimal interpolation (EnOI), and ensemble Kalman filters (EnKFs). As summarized in review papers (Balmaseda et al. 2015; Martin et al. 2015), the 3D-VAR seems to be the most used among the currently available ocean reanalysis datasets. Global high-resolution reanalysis datasets, such as Ocean ReAnalysis System 5 (ORAS5; Zuo et al. 2019), Global Ocean Physics Reanalysis (GLORYS12; https://data.marine.copernicus.eu/product/GLOBAL_MULTIYEAR_PHY_001_030/description), and Japan Coastal Ocean Predictability Experiments-Forecasting Global Ocean (JCOPE-FGO; Kido et al. 2022), have been recently produced by the 3D-VAR systems. Although the EnOI is applied to the global ocean and coupled data assimilation systems of Bluelink ReANalysis (BRAN; Chamberlain et al. 2021) and Goddard Earth Observing System Subseasonal to Seasonal prediction (GEOS-S2S) system version 2 (Molod et al. 2020), respectively, the background forecast matrix in the EnOI is static and differs from the EnKF. The EnKF is only adopted by a single reanalysis dataset referred to as TOPAZ4 (Sakov et al. 2012; Xie et al. 2017). TOPAZ4 is created using a regional ocean-sea ice data assimilation system constructed for the North Atlantic and Arctic regions with high horizontal resolution of 12–16 km, but the dataset only for the Arctic region is released through the Copernicus Marine Environment Monitoring Service website (CMEMS; https://marine.copernicus.eu/).

In the Pacific region, several high-resolution regional analysis datasets have been produced, for example, the Japan Fisheries Research and Education Agency-Regional Ocean Modelling System II (FRA-ROMS II; Kuroda et al. 2017) with 3D-VAR, the Japan Coastal Ocean Predictability Experiment 2M (JCOPE2M; Miyazawa et al. 2017) with multi-scale 3D-VAR (Li et al. 2015), the Four-dimensional variational Ocean Re-analysis for the Western North Pacific over 30 years (FORA-WNP30; Usui et al. 2017) with the 4D-VAR, and the Data assimilation Research of the East Asian Marine System (DREAMS; Hirose et al. 2013) with a Kalman filter with reduced-order approximation (Fukumori and Malanotte-Rizzoli 1995). To the best of the authors’ knowledge, there are no EnKF-based high-resolution analysis datasets available for the Pacific region.

We have developed an eddy-permitting EnKF-based regional ocean data assimilation system for the western North Pacific (WNP) region (Ohishi et al. 2022a; Ohishi et al. 2022b), in which satellite and in situ observations are assimilated at a shorter interval (1 day) than 5 days used for the TOPAZ4. Sensitivity experiments have demonstrated that a combination of relaxation-to-prior perturbation (RTPP; Zhang et al. 2004; Kotsuki et al. 2017), incremental analysis update (IAU; Bloom et al. 1996), and adaptive observation error inflation (AOEI; Minamide and Zhang 2017) are required to assure sufficient dynamical balance and accuracy. With the recent enhancement of high-performance computing resources, we are now able to integrate EnKF-based ocean data assimilation systems with sufficiently high horizontal resolution to resolve fronts and eddies. This study constructs two eddy-resolving analysis datasets for the WNP and Maritime Continent (MC) regions (Fig. 1a), where an infrared sensor on the geostationary Himawari-8 satellite has observed sea surface temperatures (SSTs) used for assimilation since July 2015 (Bessho et al. 2016; Kurihara et al. 2016).

Fig. 1
figure 1

a Topography from a 1 arc-minute global relief model of Earth’s surface (ETOPO1; Amante and Eakins 2009). Frequency of in situ assimilated observations in 2016–2018 in 5° longitude × 5° latitude bins in the b western North Pacific (WNP) and c Maritime Continent (MC) regions. In a, the orange (cyan) rectangle box denotes the WNP (MC) domain

In the WNP region, numerous fronts and eddies are distributed around western boundary currents and their extension regions (e.g., Kida et al. 2015). It has long been recognized that the mid-latitude atmosphere forces the ocean field through surface turbulent heat fluxes (Cayan 1992), entrainment at the base of the mixed layer (Miller et al. 1994), and Ekman transport (Yasuda and Hanawa 1997). With the progress of satellite observations and numerical simulation, the impact of ocean fronts and eddies on atmospheric stability and pressure has been highlighted (e.g., Nonaka and Xie 2003; Minobe et al. 2008; Nakamura et al. 2008), and thus mid-latitude air–sea interaction has attracted much attention. Studies on coupled atmosphere–ocean data assimilation that consider the exchange and covariance of uncertainty between the atmosphere and ocean fields have recently been performed (e.g., Sluka et al. 2016; Penny et al. 2019). Ensemble ocean analysis datasets would be useful not only for ocean research but also for atmospheric research to assess the impacts of the uncertainty of the oceanic field on the atmosphere.

There are few in situ observations in the MC region, especially around the shallow equatorial coastal area (Gulf of Thailand, Java Sea, Celebes Sea, Banda Sea, Arafura Sea, and Timor Sea) compared with the WNP region (Fig. 1b, c). The Intergovernmental Oceanographic Commission (IOC) Sub-Commission for the western Pacific (WESTPAC) has supported sustained observations and services around the Indo-Pacific region over long periods. An international project “CoastPredict” (https://www.coastpredict.org/) has recently been endorsed by the United Nations Educational, Scientific and Cultural Organization (UNESCO)/IOC as a program under the United Nations Decade of Ocean Science for Sustainable Development, and one of its objectives is the co-design and implementation of an integrated coastal ocean observing and forecasting system. Therefore, it is expected that the coastal observation networks around the MC region will be enhanced in the next few decades. The WESTPAC has already published an ocean forecasting system (https://iocwestpac.org/ofs/897.html), but details including the model configuration, data assimilation, and validation have not yet been reported. In the South China Sea (SCS), cyclonic (anticyclonic) ocean circulation develops during the boreal winter northeasterly (summer southwesterly) monsoon, and thus the atmospheric and oceanic fields undergo remarkable seasonal variations (e.g., Xie et al. 2003; Thompson et al. 2016). Consequently, accurate analysis datasets are required for fisheries, shipping companies, and environmental consultants as well as physical and biogeochemical oceanographers.

This study aims to create new regional ocean research analysis datasets, the local ensemble transform Kalman filter (LETKF)-based ocean research analysis (LORA), in the WNP and MC regions, and to provide detailed descriptions including system setting and validation comparison. Configurations of EnKF-based regional ocean data assimilation systems are described in Sect. 2, and the methods used to evaluate and compare validation of the LORA and existing datasets are provided in Sect. 3. The climatological ensemble mean and spread as well as the results of validation comparison of the LORA in the WNP are shown in Sect. 4, and those for the MC region are given in Sect. 5. A summary is presented in the final section.

2 LETKF-based ocean data assimilation system

This study uses an EnKF-based regional ocean data assimilation system, sbPOM-LETKF (Ohishi et al. 2022a; Ohishi et al. 2022b), which consists of the Stony Brook Parallel Ocean Model (sbPOM; Jordi and Wang 2012) and the local ensemble transform Kalman filter (LETKF; Hunt et al. 2007; Miyoshi and Yamane 2007). Overviews of the models and data assimilation techniques used are summarized in Tables 1 and 2, respectively.

Table 1 Overview of ocean models used in the LORA-WNP and -MC and JCOPE2M. Abbreviations are as follows: LORA, LETKF-based Ocean Research Analysis; LETKF, local ensemble transform Kalman filter; WNP, western North Pacific; MC, Maritime Continent; JCOPE2M, Japan Coastal Ocean Predictability Experiment 2M; sbPOM, Stony Brook Parallel Ocean Model; POM, Princeton Ocean Model; JRA-55, Japan 55-year Reanalysis; NCEP–NCAR, National Centers for Environmental Prediction–National Center for Atmospheric Research); TE-Global, Today’s Earth-Global; SODA, Simple Ocean Data Assimilation; WOA18, World Ocean Atlas 2018; FRA, Japan Fisheries Research and Education Agency
Table 2 Overview of data assimilation in the LORA-WNP and -MC and JCOPE2M. Abbreviations are as follows: 3D-VAR, three dimensional-variational assimilation; GCOM-W, Global Change Observation Mission-Water; MGDSST, Merged Satellite and In situ data Global Daily Sea Surface Temperature; MODIS, Moderate Resolution Imaging Spectroradiometer; SMOS, Soil Moisture and Ocean Salinity; SMAP, Soil Moisture Active Passive; CMEMS, Copernicus Marine Environment Monitoring Service; GTSPP, Global Temperature and Salinity Profile Programme; AQC, Advance automatic QC; IAU, incremental analysis update; RTPP, relaxation-to-prior perturbation; AOEI, adaptive observation error inflation

The sbPOM is constructed for the WNP [108°E–180°, 12°–50°N] and MC [95°–136°E, 18°S–30°N] regions with horizontal resolution of 0.1° and 50 sigma layers (Fig. 1a), which corresponds to 722 × 386 × 50 and 418 × 482 × 50 grid points, respectively. The bottom topography is obtained from a 1 arc-minute global relief model of Earth’s surface (ETOPO1; Amante and Eakins 2009). In sigma coordinate models, pressure gradient errors develop over steep bottom slopes (Mellor et al. 1994). Consequently, the topography is smoothed by applying Gaussian filters with 120 km and 105 km e-folding scale in the WNP and MC regions, respectively, to fulfill the condition \(\left|{H}_{i+1}-{H}_{i}\right|/\left|{H}_{i+1}+{H}_{i}\right|<0.2\), where \({H}_{i}\) and \({H}_{i+1}\) are bottom depths at adjacent grid cells. We use monthly temperature and salinity climatologies from the World Ocean Atlas 2018 (WOA18; Locarnini et al. 2019; Zweng et al. 2019) for the initial conditions over depths shallower than 1500 m and seasonal climatologies below 1500 m. A global ocean reanalysis dataset derived from the Simple Ocean Data Assimilation (SODA) version 3.7.2 (Carton et al. 2018) with horizontal resolution of 0.5° and 50 layers is adopted for the lateral boundary conditions of temperature, salinity, horizontal velocity, and sea surface height (SSH). The Japan 55-year Reanalysis (JRA-55; Kobayashi et al. 2015) with horizontal and temporal resolution of 1.25° and 6 h, respectively, is used for the atmospheric boundary conditions including air temperature and specific humidity at 2 m, wind velocity at 10 m, shortwave radiation, total cloud fraction, sea level pressure, and precipitation. River discharge is obtained from the Japan Aerospace Exploration Agency (JAXA)’s land surface and river simulation system, Today’s Earth (TE)-Global (https://www.eorc.jaxa.jp/water/), with horizontal and temporal resolution of 0.25° and 3 h, respectively. To prevent filter divergence, the atmospheric and lateral boundary conditions except for rainfall and river discharge are perturbed (see Appendix 1; Torn et al. 2006; Kunii and Miyoshi 2012). The model is driven by wind stresses and heat and freshwater fluxes using bulk formulae, in which bulk coefficients are estimated by the Coupled Ocean–Atmosphere Response Experiment (COARE) version 3.5 bulk algorithm (Edson et al. 2013; Brodeau et al. 2017). The horizontal diffusivity coefficient is calculated by a Smagorinsky-type formulation with a coefficient of 0.1 (Smagorinsky et al. 1965) and is assumed to be one-fifth of the horizontal viscosity coefficient. The vertical diffusivity coefficient is estimated by the Level 2.5 version of Nakanishi and Niino (2009). The model with 100 ensemble members is spun up from an initial state of no motion from 1 January 2011 to 6 July 2015 using the perturbed atmospheric and lateral boundary conditions. During the spin-up period, simulated temperature and salinity are nudged toward the WOA18 monthly climatology with a 90-day timescale to prevent northward Kuroshio overshoot along the east coast of Japan.

The 3D-LETKF with 100 ensemble members is used to assimilate the following observations on a 1-day cycle: satellite SSTs from the Himawari-8 (Bessho et al. 2016; Kurihara et al. 2016, 2021) and Global Change Observation Mission-Water (GCOM-W; Shibata 2007; https://gportal.jaxa.jp/gpr/?lang=en), satellite sea surface salinity (SSS) from Soil Moisture and Ocean Salinity (SMOS; https://www.esa.int/Applications/Observing_the_Earth/SMOS) and Soil Moisture Active Passive (SMAP) version 4.3 (Meissner et al. 2018), satellite SSH anomalies from the CMEMS (https://marine.copernicus.eu/), and in situ temperatures and salinity from the Global Temperature and Salinity Profile Programme (GTSPP; Sun et al. 2010) and Advance automatic QC (AQC) Argo Data version 1.2a (https://www.jamstec.go.jp/argo_research/dataset/aqc/index_dataset.html). Here, the mean dynamic ocean topography is assumed to be the simulated SSH averaged over 2012–14. We exclude satellite SSS within 100 km of the coasts, SSH for bottom topography shallower than 200 m, in situ temperatures and salinity duplicated between the GTSPP and AQC Argo, and observations without the best quality flags. To reduce the computational costs, we average satellite SST and SSS data at each model grid cell and then thin them at two grid intervals in longitude and latitude directions. The numbers of assimilated observations per day in the WNP and MC regions are approximately 1.5 × 105 and 0.5 × 105, respectively.

Following Ohishi et al. (2022a, b), the system adopts (a) IAU (see Appendix 2; Bloom et al. 1996), (b) RTPP (see Appendix 3; Zhang et al. 2004; Kotsuki et al. 2017) to relax the forecast ensemble perturbations toward the analysis ensemble perturbations by 90% while maintaining the analysis ensemble mean, and (c) AOEI (see Appendix 4; Zhang et al. 2016; Minamide and Zhang 2017), in which the prescribed temperature, salinity, and SSH observational errors are set to 1.0 °C, 0.3, and 0.2 m, respectively. Covariance localization is applied using the Gaussian function with horizontal (vertical) localization scale \(LS\) of 300 km (100 m) following Miyazawa et al. (2012) and Penny et al. (2013), and thus the horizontal (vertical) localization function becomes zero beyond \(2\sqrt{10/3}LS\approx 1100 \; \mathrm{km}\;(370 \; \mathrm{m})\).

Using Fugaku and JAXA Supercomputer System Generation 3 (JSS3) consisting of the Fujitsu PRIMEHPC FX1000, we integrate the system from 7 July 2015 at the start date of the Himawari-8 observations to 31 December 2019, and are currently extending the period. Following Usui et al. (2017), the SSS (SST and temperature and salinity in the ocean interior) is nudged toward the monthly climatology from the WOA18 with 3-month (1.5-year) time scale during the assimilation period. If we use 100 nodes and hybrid Message Passing Interface (MPI)/Open Multi-Processing (OpenMP) parallelization, about 20-min computation time is required for one assimilation cycle. Daily mean 3D-outputs of the ensemble mean and spread and daily mean 2D-outputs at the sea surface of all ensemble members are saved. The volume for both 3D-ensemble mean and spread in the WNP (MC) region is 45 (34) gigabyte (GB) month−1, if we save basic variables (temperature, salinity, and velocities, etc.) and all terms in heat and salinity budget equations (advection, diffusion, surface fluxes, and analysis increments, etc.). The volume of 2D-all ensemble members at the sea surface is 84 (60) GB month−1 in the WNP (MC) region. As described in Sect. 1, the analysis datasets constructed by the sbPOM-LETKF system are called LETKF-based Ocean Research Analysis (LORA), and hereafter the LORA in the WNP and MC regions are referred to as LORA-WNP and LORA-MC, respectively.

3 Validation

This study compares the validation of the LORA-WNP and -MC with regional ocean reanalysis and global observation datasets: the JCOPE2M (Miyazawa et al. 2017) and Archiving, Validation, and Interpretation of Satellite Oceanographic Data (AVISO; Ducet et al. 2000; Pujol et al. 2016; Taburet et al. 2019), respectively. An overview of the JCOPE2M is summarized in Tables 1 and 2 (see Miyazawa et al. (2017) for more details). The main differences between the JCOPE2M and LORA-WNP are (a) the JCOPE2M uses multi-scale 3D-VAR data assimilation while LORA-WNP uses LETKF; (b) neither river discharge nor satellite SSS assimilation is included in the JCOPE2M; and (c) there is no biologging temperature assimilation in the LORA-WNP. However, there are many similarities between the JCOPE2M and LORA-WNP; i.e., the sigma-coordinate regional ocean models, eddy-resolving horizontal resolution, number of sigma layers, system domains, and short assimilation intervals. The different models between the LORA and JCOPE2M are not likely to make large impacts on the forecast accuracy, and consequently the JCOPE2M is suitable for comparing validation results from the LORA-WNP. The JCOPE2M has been adopted for physical and biogeochemical oceanography studies (Kodaira and Waseda 2019; Hasegawa et al. 2021; Ishizu et al. 2021) and has provided operational information on Kuroshio and Oyashio prediction for the Kuroshio–Oyashio Watch website (https://www.jamstec.go.jp/aplinfo/kowatch/e/). The AVISO is a widely used global observation-based SSH dataset with horizontal resolution of 0.25°. It is created by combining optimal interpolated satellite SSH anomalies and mean dynamics ocean topography estimated from a geoid model, satellite altimetry, and in situ data.

The root mean square deviations (RMSDs), biases, and anomaly correlation coefficients (ACCs) of the LORA-WNP and -MC, JCOPE2M, and AVISO are calculated relative to the following in situ independent observations in 2016–2018: surface horizontal velocity from drifter buoys of the Global Drifter Program (Elipot et al. 2016); SSH anomalies from tide gauges of the University of Hawaii Sea Level Center (UHSLC; https://uhslc.soest.hawaii.edu/); temperature and salinity over 1–525 m depth and horizontal velocity over 8–36 m depth at [144.6°E, 32.3°N] south of the Kuroshio Extension (KE) from the Kuroshio Extension Observatory (KEO) buoy (https://www.pmel.noaa.gov/ocs/); and SSTs around the coasts of Japan and Korea from the Japan Fisheries Research and Education Agency (FRA; https://buoy.nrifs.affrc.go.jp/), Japan Oceanographic Data Center (JODC; https://jdoss1.jodc.go.jp/vpage/coastal.html), Kochi Prefecture Fisheries Promotion Department Fisheries Experiment Station (https://www.suisan.tosa.pref.kochi.lg.jp/), and CMEMS (http://www.marineinsitu.eu/dashboard/). We note that the drifter buoy observations may not be independent of the AVISO, as they are used for estimating the mean dynamic ocean topography in the AVISO. Since there are few available independent SST data in the offshore regions, the RMSDs relative to the assimilated Himawari-8 SSTs are also calculated. The anomalies used to calculate ACCs are high-passed values from a 31-day moving filter. We calculate the ratios of the RMSDs, absolute biases, and ACCs of the LORA relative to the JCOPE2M/AVISO, as represented by

$${R}_{RMSD}=\frac{{(RMSD)}_{JCOPE2M/AVISO}-{(RMSD)}_{LORA}}{{(RMSD)}_{JCOPE2M/AVISO}}\times 100,$$
(1)
$${R}_{Bias}=\frac{{\left|Bias\right|}_{JCOPE2M/AVISO}-{\left|Bias\right|}_{LORA}}{{\left|Bias\right|}_{JCOPE2M/AVISO}}\times 100, and$$
(2)
$${R}_{ACC}=\frac{{{(ACC)}_{LORA}-(ACC)}_{JCOPE2M/AVISO}}{{(ACC)}_{JCOPE2M/AVISO}}\times 100,$$
(3)

respectively. Here, the subscripts \(LORA\), \(JCOPE2M\), and \(AVISO\) indicate LORA-WNP and -MC, JCOPE2M, and AVISO datasets. \(\left|\bullet \right|\) denotes taking the absolute value. The positive (negative) ratios in Eqs. (1), (2), and (3) mean that the LORA more (less) corresponds to observations than the JCOPE2M/AVISO in terms of the RMSDs, absolute biases, and ACC, respectively. Significant RMSD and absolute bias differences of the LORA with the JCOPE2M and AVISO are detected at a 99% confidence level using a bootstrap method with 10,000 cycles.

4 LORA-WNP

4.1 Climatological ensemble mean and spread

As Ohishi et al. (2022a, b) demonstrated that an EnKF-based ocean data assimilation system with a frequent assimilation interval might produce unrealistic fields if suitable settings are not applied, we first confirm the ensemble mean and spread in temperature, salinity, and horizontal flow at the sea surface and zonal (meridional) section along (across) the KE averaged over 2016–2018 (Figs. 2, 3, and 4). Figure 2 shows that the LORA-WNP reproduces strong SST and SSS fronts along the quasi-stationary J1 jet [145°–155°E, 40°N] (Isoguchi et al. 2006) and strong currents along the Kuroshio and KE, as consistent with previous studies (cf. Figure 2 of Kida et al. 2015; Fig. 1 of Ohishi et al. 2019a, b). The large-scale spatial patterns in the SST, SSS, SSH, and surface flow fields in the LORA-WNP also correspond well to the observations (figure not shown). The SST and SSS ensemble spread is large along both the KE and J1 jet, whereas the large ensemble spread in SSH and surface horizontal flow is found only around the KE region.

Fig. 2
figure 2

Ensemble mean (contours) and spread (color shading) of sea surface a temperature (SST), b salinity (SSS), c zonal and d meridional velocities, and e height (SSH) averaged over 2016–2018. Thin (Thick) contour intervals are 2 (10) °C in a, 0.25 (1) in b, 0.2 (1) m s−1 in c and d, and 0.25 (1) m in e. Solid (dashed) contours denote eastward (westward) flow in c and northward (southward) flow in d. In a, the white line indicates the zonal (meridional) section along 36°N (150°E) used in Figs. 3 and 4, and the star symbol denotes the location of the Kuroshio Extension Observatory (KEO) buoy

Fig. 3
figure 3

a Zonal and b meridional sections along 36°N and 150°E, respectively, of temperature ensemble mean (contours) and spread (color shading) averaged over 2016–2018. c and d Same as a and b, but for salinity. Thin (Thick) contour intervals are 2 (10) °C in a and b, and 0.25 (1) in c and d. In b and d, white lines indicate the positions of the KE (36°N) and J1 jet (40°N)

Fig. 4
figure 4

a Zonal and b meridional sections along 36°N and 150°N, respectively, of ensemble mean of zonal velocity averaged over 2016–2018. c and d Same as a and b, but for ensemble spread. eh Same as ad, but for meridional velocity. In b, d, f, and h, white lines indicate the positions of the KE (36°N) and J1 jet (40°N)

The temperature and salinity ensemble means in the ocean interior in the LORA-WNP represent the following typical water masses corresponding to the observations (Fig. 3): subtropical mode water (STMW; e.g., Masuzawa 1969; Suga and Hanawa 1995) characterized by a vertical stratification minimum around 25°–30°N and 200–400 m depth; high-salinity North Pacific tropical water (NPTW; e.g., Cannon 1966; Katsura et al. 2013) over 0–200 m depth south of 25°N; and low-salinity North Pacific intermediate water (NPIW; e.g., Talley 1993; Yasuda 1997) around 20°–35°N and 500–1000 m depth. Figure 4a, b shows that the strong eastward KE extends to ~ 1000 m depth, as consistent with mooring observations (cf. Figures 6 and 8 of Waterman et al. 2011). As reported by historical ship observations along the 137°E section (cf. Figure 4 of Oka et al. 2018), there are weak westward flows south of the KE and south of 25°N in association with the Kuroshio recirculation gyre and the North Equatorial Current, respectively (Fig. 4b). The large temperature spread is distributed over 200–500 m depth around the main thermocline along the KE and 0–200 m depth along the J1 jet (Fig. 3b). The salinity spread is large from the surface to the sub-surface layer along the KE and J1 jet (Fig. 3d). The large ensemble spread in the horizontal flow along the KE reaches ~ 1000 m depth (Fig. 4c, d, g, h).

Around the KE region, large variations associated with the meanders and eddies lead to the large ensemble spread in the horizontal flow. According to thermal wind theory, the sharp meridional temperature slope in the main thermocline is related to the strong zonal flow. Consequently, the large temperature spread in the main thermocline around the KE region is theoretically consistent with the large spread in the zonal flow. As the J1 jet position is quasi-stationary due to topographic effects (Isoguchi et al. 2006; Mitsudera et al. 2018), the ensemble spread in the horizontal flow along the J1 jet is not as large as that for the KE. A huge amount of turbulent heat is released around the SST fronts because of strong wind speed and large air–sea humidity and temperature differences (e.g., Ohishi et al. 2016, 2019a; Tozuka et al. 2017); therefore, the perturbed atmospheric forcing may result in the large temperature and salinity spread around the J1 jet. It is beyond the scope of this study to quantitatively investigate the causes of the formation of the large ensemble spread.

Therefore, the LORA-WNP appears to reproduce the climatological temperature, salinity, and horizontal flow fields. Around the KE region, the large ensemble spread in the temperature, salinity, and horizontal flow is probably caused by the large variations associated with the abundance of the meanders and eddies. Around the J1 jet region, the ensemble spread in temperature and salinity is large, whereas that in horizontal flow is relatively small, probably because the J1 jet position is stable due to the topographic effect.

4.2 Validation in the offshore region

The RMSDs of the LORA-WNP, JCOPE2M, and AVISO relative to surface zonal and meridional velocities from the drifter buoys are calculated for 5° longitude × 5° latitude bins (Figs. 5 and 6). Large RMSDs in the surface zonal and meridional velocities are distributed along the Kuroshio and KE in all datasets (left panels of Figs. 5 and 6). The RMSD ratios of the LORA-WNP relative to JCOPE2M indicate that the LORA-WNP more closely corresponds to the drifter buoys than the JCOPE2M in the mid-latitude region, especially along the KE (Figs. 5c and 6c). The LORA-WNP is slightly closer to the observational meridional velocity only in the eastern part of the subtropical region than the JCOPE2M, but less close to the zonal and meridional velocities over most of the subtropical region. The RMSDs of the zonal and meridional velocities in the AVISO are smaller than those in the LORA-WNP over almost the whole domain, especially along the Kuroshio and KE (Figs. 5e and 6e). The RMSDs of the zonal (meridional) velocity averaged over the whole analysis period and domain are 0.259 (0.248) m s−1 for the LORA-WNP, 0.266 (0.268) m s−1 for the JCOPE2M, and 0.228 (0.212) m s−1 for the AVISO. The statistical analyses indicate that the LORA-WNP is significantly closer to the surface zonal (meridional) velocity from the drifter buoys than the JCOPE2M by 3.8% (7.5%) and is significantly less close than the AVISO by 12% (17%).

Fig. 5
figure 5

Root mean square deviations (RMSDs) of the surface zonal velocities from a LORA-WNP, b JCOPE2M, and d AVISO relative to the drifter buoys in 5° longitude × 5° latitude bins averaged over 2016–2018. The RMSD ratios of the LORA-WNP relative to the c JCOPE2M and e AVISO. Contours indicate SSH averaged over 2016–2018 from the LORA-WNP in a, JCOPE2M in b and c, and AVISO in d and e. Thin (thick) contour intervals are 0.25 (1) m. RMSDs averaged over the whole analysis period and domain are shown at the bottom right corner in the left panels

Fig. 6
figure 6

Same as Fig. 5, but for surface meridional velocity

Since there are no independent in situ SST data providing full coverage of the offshore region, the RMSDs are calculated relative to the Himawari-8 SSTs assimilated in both LORA-WNP and JCOPE2M (Fig. 7). The larger RMSDs are distributed at higher latitude in both datasets (Fig. 7a, b). This is consistent with Kurihara et al. (2016), who indicated that the Himawari-8 accuracy is lower at higher latitude, where the satellite zenith angle is larger. The LORA-WNP is closer to Himawari-8 SSTs than JCOPE2M in broad domains away from coasts, especially in the mid-latitude region, although the JCOPE2M corresponds better in the coastal region, eastern Japan Sea, and East China Sea (Fig. 7c). The RMSDs averaged over the whole analysis period and domain are 0.86 and 0.95 °C in the LORA-WNP and JCOPE2M, respectively, and the RMSDs of the LORA-WNP are significantly smaller than that of the JCOPE2M by 9.5%.

Fig. 7
figure 7

RMSDs of the a LORA-WNP and b JCOPE2M relative to the assimilated Himawari-8 SSTs averaged over 2016–2018. c The RMSD ratios of the LORA-WNP relative to the JCOPE2M. Contours indicate SSTs averaged over 2016–2018 from the LORA-WNP in a, JCOPE2M in b, and Himawari-8 in c. Thin (thick) contour intervals are 2 (10) °C. The RMSDs averaged over the whole analysis domain and period are shown at the bottom right corner in a and b

To validate the ocean-interior of the LORA-WNP and JCOPE2M, the RMSDs relative to independent observations by the KEO buoy south of the KE are calculated (cf. Figure 2a). The temperature RMSDs of both datasets increase at deeper depths (Fig. 8a). In 0–300 m depth the temperature of the LORA-WNP is significantly closer to the KEO buoy than that of the JCOPE2M, whereas it is less close below 300 m depth; however, the difference is not significant. This is likely because the prescribed vertical localization scale of 100 m in the LORA-WNP confines the impacts of assimilation of surface observations within the upper 0–370 m depth. Although this is a result from a single observation point, the better temperature fit in the LORA-WNP may be limited to the upper 300 m depth over the whole domain. In both datasets, the salinity RMSDs are smaller at 100–300 m depth and larger near the surface and below 300 m depth (Fig. 8b). Over the whole depth, the LORA-WNP shows larger salinity RMSDs than the JCOPE2M, but the difference is not significant. At most of the observation points, the RMSDs in the zonal and meridional velocities indicate that the LORA-WNP significantly corresponds better to the KEO buoy than the JCOPE2M (Fig. 8c, d). The results regarding the RMSDs of the SSTs and surface horizontal velocity at the KEO buoy are consistent with those from the Himawari-8 and drifter buoys, respectively (Figs. 5, 6, and 7).

Fig. 8
figure 8

RMSDs of the LORA-WNP (black) and JCOPE2M (red) relative to a temperature, b salinity, and c zonal and d meridional velocities from the KEO buoy averaged over 2016–2018. Crosses indicate significant RMSD differences relative to the LORA-WNP at a 99% confidence level. Note that the depth ranges in a and b are different from those in c and d

4.3 Validation in the nearshore region

Figure 9 shows the RMSDs of the LORA-WNP and JCOPE2M relative to independent coastal SSTs over the analysis period. The RMSDs of the LORA-WNP exceed 1 °C at almost all stations, and they are especially large in northern Japan (Fig. 9a). Although the JCOPE2M also shows relatively large RMSDs of 1–2 °C (Fig. 9b), the RMSDs in the JCOPE2M are smaller than the LORA-WNP over almost all stations (Fig. 9c). To gain insight into the causes of the large RMSDs in both datasets, the SST time series at Katsura Island on the northeast coast of Japan (white circle in Fig. 9a), where the RMSD is the largest among all stations, are shown in Fig. 10. The LORA-WNP and JCOPE2M have high SST biases in the winter season, and the biases are higher in the LORA-WNP than in the JCOPE2M. As shown in Fig. 11, for most of the stations, both datasets have high SST biases over 0.5 °C (Fig. 11a, b), and the biases are higher in the LORA-WNP than the JCOPE2M (Fig. 11c). The spatial pattern of the absolute bias ratios is similar to the RMSD ratios (Figs. 9c and 11c), and thus the higher SST biases result in larger RMSDs in the LORA-WNP than the JCOPE2M.

Fig. 9
figure 9

RMSDs of the SSTs from a LORA-WNP and b JCOPE2M relative to in situ SSTs from the Japan Fisheries Research and Education Agency (FRA), Japan Oceanographic Data Center (JODC), Copernicus Environment Monitoring Service (CMEMS), and Kochi Prefecture Fisheries Promotion Department Fisheries Experiment Station, averaged over 2016–2018. c The RMSD ratios of the LORA-WNP relative to the JCOPE2M. In a, the red dot with a white border denotes Katsura Island with the largest RMSDs among all stations. In c, circles and triangles indicate significant and no significant RMSD differences, respectively, at a 99% confidence level

Fig. 10
figure 10

SST timeseries of the LORA-WNP (orange), JCOPE2M (cyan), and in situ observations (black) at Katsura Island for 2016–2018

Fig. 11
figure 11

Same as Fig. 9, but for biases in a and b and the absolute bias ratios in c

To remove the contributions from the biases, the ACCs are calculated using SSTs high-passed by a 31-day moving filter (Fig. 12). At all stations, the ACCs are positive in both datasets (Fig. 12a, b). The ACC ratios of the LORA-WNP relative to the JCOPE2M indicate that the LORA-WNP has higher correlations with in situ SST data in most of the stations than the JCOPE2M (Fig. 12c). Therefore, in the coastal region, the RMSDs in the LORA-WNP are larger than in the JCOPE2M because of the higher SST biases, but the sub-monthly temporal variations are better reproduced in the LORA-WNP.

Fig. 12
figure 12

Same as Fig. 9, but for anomaly correlation coefficients (ACCs)

Although we adopt the JRA-55 for the atmospheric forcing in the LORA, the horizontal resolution of 1.25° might not be sufficient to resolve coastlines. As a result, it seems that the fine coastline in the data assimilation system is not well captured by the coarse land–sea mask in the JRA-55, and thus weak land-surface wind is assigned to nearshore system grid cells. Sensitivity experiments on horizontal resolution for atmospheric reanalysis datasets are currently underway. The AOEI might also contribute to the high SST biases in the LORA-WNP, as it reduces the analysis increments by inflating the observation errors where the innovation is large. However, as the AOEI substantially improves the low-salinity NPIW structure around the KE region (Ohishi et al. 2022b), it is better to apply the AOEI.

The RMSDs of the LORA-WNP, JCOPE2M, and AVISO relative to the SSH anomalies from the tide gauges over the analysis period are calculated, although there may be room for improvement in the coastal atmospheric forcing (Fig. 13). The RMSD ratios of the LORA-WNP relative to the JCOPE2M indicate significantly lower RMSDs in the LORA-WNP in northern Japan but significantly larger RMSDs in the other regions than in the JCOPE2M (Fig. 13c). The RMSDs of the AVISO are significantly smaller than that of the LORA-WNP over almost all stations (Fig. 13e). These findings are roughly consistent with the results from the drifter buoys.

Fig. 13
figure 13

RMSDs of the a LORA-WNP, b JCOPE2M, and d AVISO relative to SSH anomalies from the University of Hawaii Sea Level Center (UHSLC) averaged over 2016–2018. The RMSD ratios of the LORA-WNP relative to the c JCOPE2M and e AVISO. In c and e, circles and triangles indicate significant and no significant RMSD differences, respectively, at a 99% confidence level

5 LORA-MC

5.1 Winter and summer climatological ensemble mean and spread

Southeast Asia undergoes remarkable seasonal variations of both atmosphere and ocean (Fig. 14). In the boreal winter with the northeasterly monsoon, a cyclonic circulation develops in the SCS (Fig. 14b). The cold tongue along the eastern coast of Vietnam is induced by southward cold advection and turbulent heat release (Fig. 14a; Thompson et al. 2016; Seow and Tozuka 2019). In contrast, the southeasterly monsoon dominates in boreal summer, and the SCS is characterized by an anticyclonic circulation (Fig. 14d). Near the east coast of Vietnam, the coastal upwelling caused by the southeasterly wind results in localized cool SST (Fig. 14c; Xie et al. 2003; Dippner et al. 2007).

Fig. 14
figure 14

a SSTs from the Himawari-8 and b SSH from the AVISO in February averaged over 2016–2018. c and d Same as a and b, but for August. Thin (thick) contour intervals are 1 (5) °C in a, 0.5 (5) °C in c, and 0.1 (0.5) m in b and d

Figures 15, 16, and 17 show the ensemble mean and spread of the SST, SSH, and SSS in the LORA-MC in February and August averaged over 2016–2018. In the SCS, the LORA-MC reproduces the cold tongue associated with the cyclonic circulation in boreal winter and the localized cool SST and anticyclonic circulation in boreal summer, as consistent with the observations (Figs. 15a, c, and 16a, c). In the summer rainy season with the large river discharge from the Mekong River, the salinity near the east coast of Vietnam is lower than in winter (Fig. 17a, c).

Fig. 15
figure 15

SST ensemble a mean and b spread in February averaged over 2016–2018. c and d Same as a and b, but for August. Thin (thick) contour intervals are 1 (5) °C in a and 0.5 (5) °C in c

Fig. 16
figure 16

Same as Fig. 15, but for the SSH. Thin (thick) contour intervals are 0.1 (0.5) m

Fig. 17
figure 17

Same as Fig. 15, but for sea surface salinity (SSS). Thin (thick) contour intervals are 0.25 (2)

The SST ensemble spread is large around the Kuroshio region in February and relatively small in other regions (Fig. 15b). In August, the relatively large SST spread near the east coast of Vietnam may be related to the sensitivity of wind-induced cool SST to the perturbed atmospheric forcing (Fig. 15d). Large SSH spread is found around the Kuroshio region only in August when the Kuroshio transport is seasonally large because of the joint effect of baroclinicity and bottom relief (JEBAR; Sakamoto and Yamagata 1996; Fig. 16b, d). The SSS ensemble spread is large around the Kuroshio region in both February and August (Fig. 17b, d). The position of the salinity spread maximum corresponds to the large SST and SSH spread in winter and summer, respectively, and thus the formation mechanisms of the large salinity spread may be different. As in the LORA-WNP system, it is beyond the scope of the present study to quantitatively investigate the processes responsible for the large ensemble spread.

5.2 Validation of the LORA-MC

As described in Sect. 1, there are few in situ and independent observations in the MC region (cf. Figures 1 and 18). For example, there are more than 10 drifter buoy observations per month per bin in the offshore region, but only a few observations are available around the coastal equatorial region (Fig. 18). In this subsection, the RMSDs and biases of the LORA-MC are calculated relative to the independent observations by the drifter buoys and tide gauges and the assimilated Himawari-8 SSTs (Figs. 19, 20, and 21).

Fig. 18
figure 18

Frequency of the surface drifter buoy observations in 5° longitude × 5° latitude bins in 2016–2018. Note that white color indicates no observations

Fig. 19
figure 19

RMSDs of the LORA-MC relative to surface a zonal and b meridional velocities from the drifter buoys for 5° longitude × 5° latitude bins averaged over 2016–2018. c and d Same as a and b but for the AVISO. The RMSD ratios of surface e zonal and f meridional velocity of the LORA-MC relative to the AVISO. Contours denote the SSH averaged over 2016–2018 from the LORA-MC in a and b and the AVISO in c and d. Thin (thick) contour intervals are 0.1 (1) m. RMSDs over the whole analysis period and domain are shown in the top left corner in ad

Fig. 20
figure 20

a RMSDs and b biases of the LORA-MC relative to the Himawari-8 SSTs averaged over 2016–2018. Contours denote the a LORA-MC and b Himawari-8 SSTs averaged over 2016–2018. Thin (thick) contour intervals are 1 (5) °C. The RMSD over the whole analysis period and domain is shown in the top left corner in a

Fig. 21
figure 21

RMSDs of the a LORA-MC and b AVISO relative to SSH anomalies from tide gauges. c The RMSD ratios of the LORA-MC relative to the AVISO. In c, circles and triangles indicate significant and no significant RMSD differences, respectively, at a 99% confidence level

As in Sect. 4.2, the RMSDs of the LORA-MC and AVISO relative to the drifter buoys are calculated for the surface zonal and meridional velocity fields in 5° longitude × 5° latitude bins (Fig. 19). In the offshore region, the RMSDs in both zonal and meridional velocities are relatively small in the LORA-MC and AVISO. There are large RMSDs over 0.3 m s−1 in the LORA-MC around the Kuroshio region and parts of the coastal equatorial region (near the eastern coast of Kalimantan Island and the western boundary of the system; Fig. 19a, b). In the AVISO, large RMSDs over 0.3 m s−1 are distributed more broadly around the coastal equatorial region than in the LORA-MC (Fig. 19c, d). The RMSD ratios of both zonal and meridional velocities of the LORA-MC relative to the AVISO indicate that the LORA-MC is closer to the drifter buoys than the AVISO around the equatorial coastal region and is less close in most of the offshore region (Fig. 19e, f). The larger RMSDs of the AVISO in the equatorial coastal region may be because the small number of drifter observations results in less accurate mean dynamical ocean topography. For the surface zonal and meridional velocities, the spatiotemporally averaged RMSDs of the LORA-MC are 0.275 and 0.265 m s−1 for the LORA-MC, respectively, and are significantly and larger than that of the AVISO of 0.254 and 0.223 m s−1 by 8.3 and 18.8%.

The RMSDs and biases relative to the assimilated Himawari-8 SSTs are calculated over the analysis period (Fig. 20). There are small (large) RMSDs in the offshore (nearshore) regions (Fig. 20a), and the RMSD averaged over the whole analysis period and domain is 0.62 °C. As in the LORA-WNP, the high SST biases relative to the Himawari-8 are distributed near the coastal regions in the LORA-MC (Fig. 20b), and thus they result in the large RMSDs. As discussed in Sect. 4.3, more accurate atmospheric forcing around nearshore regions would be required to improve the accuracy in the LORA-MC.

The RMSDs of the LORA-MC and AVISO over the analysis period are calculated relative to the SSH anomalies from tide gauges (Fig. 21). The LORA-MC and AVISO have smaller RMSDs around the equatorial coastal region than in the Kuroshio region (Fig. 21a, b). The RMSD ratios of the LORA-MC relative to the AVISO indicate that the AVISO significantly better fits to the tide gauge observations than the LORA-MC at almost all stations (Fig. 21c). Here, the LORA-MC and AVISO are positively correlated with the observations at all stations, although the AVISO has higher correlation than the LORA-MC at almost all stations (figure not shown). Consequently, the LORA-MC reproduces temporal variations in the SSH to some extent, but the AVISO corresponds better than the LORA-MC at almost all stations.

6 Summary

We have constructed LETKF-based high-resolution regional ocean analysis datasets named LORA (LETKF-based Ocean Research Analysis) in the WNP and MC regions, i.e., LORA-WNP and LORA-MC, respectively. The validation results of the LORA, JCOPE2M, and AVISO were compared by calculating the RMSDs, biases, and ACCs relative to the independent observations (the surface drifter buoys, KEO buoy, in-situ coastal observations, and tide gauges) and assimilated Himawari-8 SSTs.

The LORA-WNP appears to reproduce well the climatological temperature, salinity, horizontal flow, and SSH fields. It has the large ensemble spread in the SST, SSS, SSH, and surface horizontal flow fields around the KE region, whereas the ensemble spread is large only in the SST and SSS fields along the J1 jet (Figs. 2, 3, and 4). The stationary position of the J1 jet caused by topographic effects probably results in the relatively small ensemble spread in the horizontal flow and SSH.

The RMSDs of the LORA-WNP relative to the horizontal flow measured by the surface drifter buoys and the KEO buoy indicate that the LORA-WNP is less close to the independent observations than the JCOPE2M in the subtropical region but is closer in the mid-latitude regions, especially along the KE, although the AVISO is the closest of the three datasets (Figs. 5, 6, and 8c, d). The Himawari-8, KEO buoy, and coastal SST observations show that the SST of the LORA-WNP broadly better fits to the observations than that of the JCOPE2M in the subtropical and mid-latitude regions except for the coastal regions, eastern part of the Japan Sea, and East China Sea (Fig. 7). The smaller temperature RMSDs of the LORA-WNP than of the JCOPE2M may be limited to the depth shallower than 300 m over the whole domain, likely because of the prescribed vertical localization scale of 100 m (i.e., cutoff length scale of 370 m) in the LORA-WNP, although this is a result from a single independent observation from the KEO buoy south of the KE (Fig. 8a). In the coastal regions, the higher SST biases in the LORA-WNP than in the JCOPE2M result in larger RMSDs, but the sub-monthly temporal variations are better reproduced in the LORA-WNP than the JCOPE2M (Figs. 9, 10, 11, and 12). The salinity RMSDs are larger in the LORA-WNP than the JCOPE2M over the whole depth at the KEO buoy, although the difference is not significant (Fig. 8b).

The LORA-MC also represents realistic climatological fields: the cold tongue along the east coast of Vietnam associated with the cyclonic circulation in boreal winter, and wind-induced localized cool SST near the east coast of Vietnam and anticyclonic circulation in the SCS in boreal summer (Figs. 15a, c, 16a, c, and 17a, c). Around the Kuroshio region, there is large ensemble spread in the SST (surface horizontal flow) in winter (summer), and the SSS spread is large in both seasons (Figs. 15b, d, 16b, d, and 17b, d). In the rest of the MC region, the ensemble spread in SST, SSS, and surface horizontal flow fields is relatively small.

The RMSDs of the LORA-MC relative to the drifter buoys are smaller than those of the AVISO around the equatorial coastal region, probably because of less accurate mean dynamical ocean topography in the AVISO caused by the small number of drifter buoy observations, whereas they are larger in the offshore region (Figs. 18 and 19). The results using the Himawari-8 SST show large RMSDs around the nearshore region because of high SST biases, as in the LORA-WNP, and small RMSDs in the offshore region (Fig. 20). The RMSDs relative to the tide gauges indicate better fit of the SSH in the AVISO than the LORA-MC at almost all stations. In summary, the LORA-WNP and LORA-MC have sufficient accuracy to be used in a variety of geoscience research areas, as well as for fisheries and marine transport applications and marine environment consultants.

Although various institutions are now developing EnKF-based ocean data assimilation systems, no high-resolution EnKF-based ocean data assimilation products are available in the Pacific region to the best of the authors’ knowledge. Multiple global high-resolution reanalysis datasets are currently available as described in Sect. 1, whereas the LORA has limited domains in the WNP and MC regions. This allows higher resolution analysis in the LORA. Moreover, the LORA-WNP and -MC have unique capability to visualize the spatiotemporal changes of the uncertainty represented by the ensemble spread. Since the LORA includes all terms in the heat (salinity) budget equation including the advection, diffusion, surface heat (freshwater) fluxes, and analysis increments, the LORA can be used to quantitatively investigate the causes of spatiotemporal temperature and salinity variations. The LORA-WNP and -MC would be useful for geosciences research in the mid-latitude and equatorial regions, respectively, except for nearshore areas, where the accuracy of the LORA-WNP and -MC seems to be not as reliable. For further improvement, our future study will investigate the impact of using high-resolution atmospheric forcing because nearshore weak wind speed may be an artifact of low-resolution atmospheric forcing and the reason for the nearshore high SST biases in the LORA. Furthermore, we will investigate the predictability for the Kuroshio path shift to a large meander path in summer 2017 (Sugimoto et al. 2020; Sakajo et al. 2022) by conducting ensemble prediction experiments. Although the system still includes tuning parameters for ensemble size, localization scales, observation errors, and uncertainty in the atmospheric forcing, which might improve the accuracy, it is beyond the scope of the present study to survey them. We are now planning to extend the analysis period and to construct a near-real-time ensemble forecast system. We will publish the daily-mean sea-surface analyses for all 100 ensemble members as well as 3D-ensemble mean and spread in the LORA-WNP and LORA-MC on a web site in 2023.