Applying the adjoint-free 4dVar assimilation to modeling the Kuroshio south of Japan

Operational ocean nowcast/forecast systems require real-time sampling of oceanic data for representing realistic oceanic conditions. Satellite altimetry plays a key role in detecting mesoscale variability of the ocean currents. The 10-day sampling period and horizontal gaps between the altimetry tracks of 100 km cause difficulties in capturing shorter-term/smaller-scale ocean current variations. Operational systems based on a three-dimensional variational method (3dVar) do not take into account temporal variability of the data within data assimilation time windows. Four-dimensional data assimilation technique is considered as a possible tool for more efficient utilization of the observations arriving from satellite altimeters by the dynamically constrained interpolation. In this study, we develop and test the performance of the adjoint-free four-dimensional variational method (a4dVar) for operational use in regional models. Numerical experiments targeting the Kuroshio path variations south of Japan demonstrate that the a4dVar scheme dynamically corrects the initial condition so as to effectively reduce the satellite altimetry data misfit during a 9-day time window. The corrected initial condition further contributes to improvements in the skill of reconstruction of the Kuroshio path variation in a 30-day lead hindcast run.


Introduction
Utilization of observation data on a real-time basis is crucial for the operational ocean nowcasting/forecasting based on dataassimilative ocean general circulation models (Bell et al. 2015). Assimilation of satellite data including sea surface temperature (SST) and sea surface height anomalies (SSHA) significantly improves the estimates of positions, scales, and intensities of mesoscale features of ocean current phenomena. In situ temperature and salinity profile data (T/S) are essential for correction of the water mass properties simulated by the ocean models.
The variational method is one of the major approaches in data assimilation (DA). The three-dimensional DA (3dVar) schemes have been adopted for a long time to provide the nowcast/ forecast information of oceanic conditions (e.g., Miyazawa et al. 2009). However, the 3dVar technique does not take into account information on the time evolution of ocean state within the DA time window despite the fact that this information is always present in the observations. The four-dimensional DA (4dVar) technique naturally incorporates prior information on the ocean dynamics, and the respective numerical methods based on the tangent linear and adjoint (TLA) codes have been already developed for several ocean general circulation models (OGCMs) (e.g., Wunsch and Heimbach 2007;Moore et al. 2011;Ngodock and Carrier 2014;Usui et al. 2015).
Development and maintenance of the TLA codes in addition to the forward model codes, which perpetually increase in complexity, require significant efforts and expenses. For mitigation of such difficulties, various kinds of ensemble-based variational methods have been proposed (Zupanski 2005;Liu et al. 2008;Zhang et al. 2009;Yaremchuk et al. 2009). These methods do not require development of the TLA codes and employ ensemble perturbations to estimate the cost function gradients.
The adjoint-free four-dimensional variational method (a4dVar) developed by Yaremchuk et al. (2016, hereinafter Y16) seems to be suitable for oceanic applications of ensemblebased 4dVar methods because the background error covariance (BEC) of this scheme does not always require prescribed model ensembles that should represent the realistic model error statistics. In the ocean, observations are less abundant than in the atmosphere and the ensemble-based BEC estimates which constitute the backbone of other ensemble-based methods (Liu et al. 2008;Zhang et al. 2009) tend to be much less accurate. The a4dVar method has been further modified to include hybrid of seeds of the ensemble perturbations and the balance operator representing the geostrophic and hydrostatic constraints included in BEC (Yaremchuk et al. 2017, hereinafter Y17).
In the last decade, the Japan Coastal Ocean Predictability Experiment (JCOPE) has been in operation for comprehensive studies of ocean current predictability using OGCMs with a particular focus on the Kuroshio path variations south of Japan Miyazawa 2013, 2014). The DA scheme of the present operational version is based on multi-scale 3dVar (ms3dVar) (Miyazawa et al. 2017, hereinafter M17;Miyazawa et al. 2019). Motivated by recent progress of the computer facilities characterized by the perpetual increase in the number of available processors, significant efforts have been made to develop ensemble-based DA techniques that are highly efficient in taking the full advantage of the massively parallel architectures of modern computers. In the framework of the JCOPE project, these techniques include the ensemble Kalman filter (Miyazawa et al. 2012 and have no option of using TLA codes. In the present study, we explore the feasibility of implementation of the a4dVar scheme for representing the Kuroshio variations south of Japan from a viewpoint of applicability of the scheme for operational use. Although the a4dVar scheme demonstrated compatibility with 4dVar in terms of skill and computational efficiency in regional application (Y16; Y17), its feasibility in operations is still unclear. In the present paper, we focus on the (a) selection of the a4dVar parameters suitable for representing the Kuroshio variations south of Japan, (b) assessment of the possible skill improvements compared to 3dVar, and (c) exploration of the impact of the balance operator on the DA performance with a particular focus on assimilation of SSHA component of the data. Studying these issues is a prerequisite before comprehensive investigating the applicability of the a4dVar scheme in future operations. This paper is organized as follows. Section 2 provides a basic description of the model and the 3dVar scheme used in the present study and details of the a4dVar implementation and describes sea level data used for independent validation of the DA experiments. In Section 3, we provide the major results of the a4dVar implementation, focusing on parameter sensitivity and a4dVar comparison with the operational 3dVar in terms of the skill in reproducing oceanic conditions including the Kuroshio path variations south of Japan. Section 4 is devoted to discussion and summarizes the results.
2 Methods and data 2.1 A regional ocean model with a multi-scale 3dVar DA scheme To examine the feasibility of a4dVar implementation for DA simulations of the Kuroshio path variations south of Japan, a regional OGCM based on sbPOM (Jordi and Wang 2012) was configured in the domain shown in Fig. 1. The domain covers a horizontal range of 28-36°N and 128-142°E with a horizontal resolution of 1/36°and 46 vertical sigma layers. The bottom topography of the model (Fig. 1) was created from the 1/120°d ata set provided by the Japan Hydrographic Association, "JTOPO30." The model was driven by wind and heat fluxes calculated using the bulk formulae ) with 6-houly atmospheric variables provided from the National Centers for Environmental Prediction (NCEP)/National Center for Atmospheric Research (NCAR) reanalysis data (Kalnay et al. 1996). The surface salt flux was relaxed to surface salinity of the monthly mean climatology, World Ocean Atlas 2001 (Conkright et al. 2002) with a restoring rate of 10 m/ 30 days. The lateral boundary condition was specified from an operational nowcast/forecast system, JCOPE2M, with a horizontal resolution of 1/12°and 46 vertical sigma layers (M17). Tide forcing was not included.
Starting from temperature/salinity distributions interpolated from the JCOPE2M product on 1 November 2016 with no motion and zero sea level, the model was spun up for 8 months using the ms3dVar DA scheme developed in M17. Satellite sea surface height anomaly (SSHA), satellite sea surface temperature (SST), and in situ temperature/salinity profiles (T/S) data were assimilated into the model with 2-day interval (see Table 1 for details of the observation data). To reduce computational cost, the ms3dVar DA analysis was represented on a coarser (1/8°, 24 levels) grid and was interpolated to the model grid. Parameters required for ms3dVar were basically the same as those described by M17 except for the setting of SSHA observation error. Time windows of SSHA, SST, and T/S observational coverage were respectively bounded by 4, 1, and 5 days before and after the analysis time. The window size for each type of satellite data was determined by the respective temporal acquisition frequency. The major repeat periods of satellite operations are typically 10 days and 1 day for SSHA (e.g., Jason-3) and SST (e.g., Himawari-8), respectively. The time window length of T/S was empirically determined because of the sampling irregularity in the measurements. The inverse of the SSHA observation error was specified by a Gaussian function of the time interval between the observation and analysis times with the values of 0.1 m at the analysis time and 0.27 m at the ends of the observation window.
Incremental analysis update (IAU; Bloom et al. 1996) of analysis temperature/salinity was also used for better initialization as in M17. The analysis temperature/salinity used for IAU was smoothed spatially and temporally to preserve highfrequency variations represented by the current DA model which has higher horizontal (1/36°) and temporal (daily) resolutions than those of the analysis: 1/8°and 2-day mean, respectively. The first guess state of the a4dVar experiment was the result of IAU on 4 June 2017 (Fig. 2).

Implementation of the adjoint-free 4dVar scheme
The a4dVar minimizes a cost function penalizing a misfit between the observation data and model state together with that between the first guess and model states (Eq. 7) by iterations (Y17; see Appendices 1, 2, and 3 for more details). The a4dVar scheme in the present study was implemented with the following parameters. The background square root error variances V T ;S; e U ; e V; e ζ (Eqs. 20-21) were assumed to be stationary and evaluated by computing root mean square (RMS) variances of the ms3dVar assimilation products for a 1-year period starting from 1 January 2017. To mainly focus on the mesoscale eddy variability, we calculated RMS variance of temperature (salinity) after removing the horizontal means from the temperature (salinity) variable: ΔT t = T t − (T t ) hm denotes daily anomaly from the horizontal mean (T t ) hm and overbars denote temporal averages over the respective data sets. The resulting RMS variance of temperature deviation ΔT t actually represents the mesoscale variability associated with the Kuroshio path variations and the surrounding mesoscale eddy activity south of Japan both at the surface ( Fig. 3a) and at the subsurface levels (Fig. 3b). The observation error matrix R t (Eq. 7) was assumed to be diagonal and stationary. Observation errors of SSHA and SST were 0.1 m and 2°C, respectively. Since the observation error includes the measurement and representativeness errors (Cohn 1997), the magnitudes of the observation errors are usually larger than those of the measurement errors. The amplitude of the representativeness errors depends on the model schemes, grid size, and sampling properties of the observation data. Note that we applied a relatively large (2°C) value of the SST observation error, since we mainly focused on the SSHA assimilation in the present study. Observation error profiles of temperature and salinity were evaluated by taking the horizontal mean values of the square root variances of temperature and salinity deviations, respectively (Fig. 4). The subtraction of the horizontal mean at each vertical level results in amplification of the magnitude around the main thermocline depth (400 m), whose variability is closely related with the mesoscale eddy variations.
The nondimensional magnitude ε of the ensemble perturbations was 0.01. The relative magnitude of the unbalanced components β in the SSH and velocity fields was 0.2. After conducting a series of preliminary experiments, the number of a4dVar iterations (ensemble updates) in minimizing the cost function over the DA window was fixed at 10.
We tested three cases of the DA time window length N (5, 9, and 13 days) for a period starting from 4 June 2017. All observation data including SSHA, SST, and T/S used for the 3dVar assimilation were projected on N + 1 time layers with daily interval in between. Throughout the experiments, the number of search directions (ensemble size) was set to 2N.
In the course of sensitivity experiments, we also tested three combinations of the horizontal correlation scales R o1 and R o2 specifying in the horizontal diffusion operators in the BEC formulation for the T/S and SSH state vector components, respectively: (1) R o1 = 90, R o2 = 45, (2) R o1 = 45, R o2 = 21, and (3) R o1 = 21, R o2 = 21 (km). The horizontal correlation scale is related to the spatial scale of the mesoscale eddies in this region. A typical scale of 100 km south of Japan (M17) provides an upper limit of the scale and the Nyquist scale of 6 km (2× the grid size) should be the lower limit. We found an optimum choice among the three combinations of the parameters used for the sensitivity experiments as described later (Section 3.1). We also investigated the effects of relative weight C B of the BEC by specifying the cost function in the form, where C B is a constant with tested values 2, 3, 4, and 5 (see Eq. 7 for detailed notations). All experiments were performed on the Intel Xeonbased scalar processor system (HPE Apollo 6000 with total 12,240 cores). Each ensemble member run was independently calculated by parallel processing with 108 processor cores. Maximum 108 (cores) × 26 (members) =2808 cores have been simultaneously used for the ensemble simulations. The average wall time for computing 10 a4dVar was close to 6 h.

Independent observation data used for validation
For skill evaluation, we used four types of data, which were not assimilated into the model. The first data set is sea level observation obtained at seven tide gauges along the southern coasts of Japan (Fig. 1). To extract sea level variations associated with ocean currents, we applied the barometric correction (Kawabe 1989) and removed the tidal variations from the data using the tide killer filter (Hanawa and Mitsudera 1985). It is well known that the corrected daily sea level variations along the southern coasts of Japan are closely related to the Kuroshio path and, therefore, can serve as an important indicator of ocean variability at larger scales (Kawabe 1989). The tide gauge sea level (TGSL) data were never assimilated in both 3dVar and a4dVar cases, and thus, they were independent from the DA procedures. SSHA values from the DA model runs were compared with the tide gauge observations after extracting temporal means from both products during the DA target period.
The second type of observations used for validation is the SSHA, SST, and T/S data acquired in a period immediately following the DA window. After optimizing the ocean state at the end of the DA window either 3dVar and a4dVar technique, the model was run freely (without DA) over the period ("hindcast," later described in Section 3.3), and its output was compared with the respective data acquired during this period.
The Kuroshio axis position data reported by the Japan Coast Guard (the JCG Kuroshio report) for monitoring the real oceanic conditions were used for the validation of the hindcast results.
In addition, we also compared the ocean currents at 15 m depth with drifter velocity (DFV) data calculated from drogued drifter data compiled by the Drifter Data Assembly Center (DAC) at the Atlantic Oceanographic and Meteorological Laboratory (AOML). The drifter data were quality-controlled and interpolated to 6-houlry intervals using an optimum interpolation procedure (Lumpkin and Centurioni 2019).

Reference data for skill comparison
For skill comparison, we used daily level 4 gridded SSHA data (L4SSHA) at 0.25°resolution from the COPERNICS Marine Environment Monitoring Service (CMEMS). This data set was synthesized from multiple along-track SSHA observations by using a statistical interpolation technique (Pujol et al. 2016). Both L4SSHA and our DA products have the same source (along-track SSHA data) but were obtained using different (statistical and dynamical, respectively) interpolation techniques. We evaluated the skill of L4SSHA with respect to the tidal gauge SSHA for reference. The L4SSHA data provided from CMEMS include the daily geostrophic current (CMEMS geostrophic current) evaluated from the abovementioned daily SSH data combining the absolute mean sea level and daily L4SSHA data (Pujol et al. 2016). We evaluated the skill of the CMEMS geostrophic current with respect to the drifter data.

Skill metrics
To assess the skills of the DA products, we evaluated the overall reduction ratio of the cost function (1), where the subscripts fg and end denote, respectively, the cost function value of the first guess solution and at the end of assimilation.
For a more detailed assessment, the reductions of the model-data misfit term (the second term) of the cost function (1) were estimated separately for each observation variable type (SSHA, SST, T/S):  To assess similarity in the field variation, we also calculated correlation (COR) and root mean square difference (RMSD) between the model (Model) and observation (Obs) variables (TGSL and DFV), where L is the number of the target observations and overbars denote averages over the respective data sets. To estimate the skill in representing the TGSL observed at the seven stations ( Fig. 1), we further calculated averages of RMSD and COR among all these stations, 3 Results

Experiments on sensitivity to variation of the a4dVar experiments
We conducted a series of experiments for investigating sensitivity of the time window length (N) on the skill. Other parameters were fixed at C B = 3, R o1 = 45, R o2 = 21. The time window lengths affect the efficiency of the cost function minimization as shown in Table 2. The 9-day window is characterized by the largest reduction ratio (rrcost) of the total cost function. Reduction of a cost function term penalizing the distance with SSHA observation (rrcost(ssh)) is less sensitive to the time window length than the other terms. Effects of the horizontal correlation scales R o1 and R o2 on the skills were examined by conducting three experiments with fixed parameters of the 9-day window length and C B = 3 (Table 3). In addition to the DA runs during the time window from 4 to 13 June 2017, we performed hindcast runs from 13 June to 23 June 2017 (10-day lead hindcasts) without DA starting from the optimized states on 13th of June obtained after the DA runs. The use of the larger horizontal scales (R o1 = 90, R o2 = 45 km) results in skill deterioration for both the DA and 10-day lead hindcast periods. The larger-scale diffusion parameters of BEC suppress smaller-scale features in the correction increment to the initial condition by intensively smoothing them (not shown). Table 3 further indicates the smaller correlation scales (R o1 = 21, R o2 = 21 km) tend to produce somewhat worse skills for the 10-day lead hindcast (right column in Table 3).
Using the 9-day DA window (N = 9) and the horizontal correlation scales of R o1 = 45, R o2 = 21 km, we conducted a series of sensitivity experiments by varying the relative weighting C B of the BEC (Eq. 1). Table 4 indicates that the case with C B = 3 provides the best skills for both of the DA run and the 10-day lead hindcast. We thus decided to analyze the a4dVar best case with N = 9, R o1 = 45, R o2 = 21 and C B = 3 (hereafter referred as "a4dVar") in more detail.
The ratio of total cost (rrcost) (Eq. 2) reduced to a fixed level of 0.77 after 10 iterations (not shown) which seems to be significantly larger than that reported in the regional study of Y17 of the Adriatic Sea (0.2). This is because the first guess state in the present study was initialized by the well-tuned 3dVar (M17) but not a free running simulation as in Y17. All terms of the cost function penalizing distances between the model and observations (Eq. 3) show reduction of their values. In particular, the SSHA model-data misfit (rrcost(SSHA)) (Eq. 4) demonstrates the most significant Fig. 5 Time sequences of RMSD between the simulated and observed SSHA during the 9-day DA window for the first guess (blue curve), the a4dVar best case (green curve), and the ms3dVar assimilation product (red curve) reduction (0.52, Table 2) of the initial value. A time sequence of RMSD between the modeled and observed SSHA in the a4dVar case (Fig. 5, green curve) exhibits a reasonable reduction to the level around 0.1 m, which is a prescribed measurement error of SSHA, as compared to relatively large RMSDs of the first guess run starting from the initial condition (Right) Difference in temperature (color) and velocities (vectors) between the a4dVar and the first guess solutions. White-colored arrows represent the positive temperature anomaly associated with a rise of sea level at Hachijojima on 13 July 2017 (see Fig. 11b) Fig. 8 a-c B-smoothed SSHA data misfit (color) and difference in temperature (contours) between the a4dVar and the first guess solutions generated by 3dVar (Fig. 5, blue curve). The RMSD in the a4dVar run also exhibits a similar level of fitting to SSHA as compared to that in the 3dVar assimilation product (Fig. 5, red curve).

Oceanic conditions represented by a4dVar within the 9-day time window
The a4dVar increment to the initial condition δx = x − x f , shown in Fig. 6, indicates that regions of the largest corrections are confined around the Kuroshio main stream. This feature is consistent with the enhanced variability around the Kuroshio front represented by the square root mean variance (Fig. 3), which is included in BEC (Eq. 20). Horizontal scale of the correction in the temperature field is generally larger at the surface (Fig. 6a) than at the subsurface (Fig. 6b). The magnitude of correction to the temperature field is stronger at 200 m depth than at 1 m depth. This is also consistent with the vertical profile of the temperature variance (Fig. 4). In contrast, the correction to the flow field is larger at 1 m depth than at 200 m depth reflecting the structure of vertical profiles of velocity variance (not shown). Figure 7 compares snapshots of the a4dVar-optimized subsurface fields (left) and the respective differences between the optimized and first guess fields (right) during the 9-day DA window. Changes by applying a4dVar are most evident east of 135°E as seen in the increment structure. In particular, the southeastward elongation of the Kuroshio meander around 32°N, 138°E visible on 13 June 2017 in the first guess run (not shown) is less profound in the a4dVar case.
Comparison of the B-smoothed SSHA (Eq. 30) modeldata misfit and difference in the subsurface conditions between the a4dVar and first guess runs (Fig. 8) demonstrates that the a4dVar scheme dynamically corrects the first guess state to reduce model-data misfit at the later observation times. Blue/red colors in Fig. 8 indicate that the first guess run overestimates drop (rise) of SSHA at measurement times.
The negative SSHA misfit on June 4 is partly compensated by the positive increments of the subsurface temperature denoted by the black solid contours (top panel of Fig. 8). This means that the a4dVar scheme vertically projects the information on the SSHA data misfit to the subsurface oceanic condition. The downstream propagation of the temperature increment (to 32°N 138°E on June 13, shown by the white arrow in the right panels of Fig. 7) tends to correct the SSHA data misfits around the measurement places at the measurement times in the a4dVar run (middle and bottom panels of Fig. 8).
Validation by the independent TGSL data ( Table 5, Eq. 6) indicates a4dVar improves the correlation of the first guess run even though it slightly increases RMSD. The correlation of the a4dVar run shows the best skill among all the products. The skills of 3dVar assimilation and L4SSHA product are worse than a4dVar because they have a tendency to smooth temporal variations of SSHA at time scales shorter than the typical satellite repeat cycle of 10 days. Comparison of the surface current with the independent DFV data indicates a slight improvement in the estimate of the skills by the a4dVar iteration (cf. itr = 1 and itr = 10 in Table 6). The ms3dVar assimilation and CMEMS products outperform the a4dVar result during this DA window because the former products assimilate additional data that are not included in the a4dVar DA window. This additional information might be effective for representing the surface currents even it is not obtained within the DA window period, because the temporal scale of the surface drifter currents ranges from 1 to 7 days in the subtropical ocean (McClean et al. 2002).

Comparison of the hindcast run initialized by the a4dVar and other products
We conducted 60-day hindcasting experiments beyond the 9day DA window to investigate the differences in the skills between hindcast runs starting from the 3dVar-and a4dVaroptimized initial conditions on 13 June 2017. We used the 3dVar-initilized condition on 10 June 2017 because the 3dVar scheme assimilated SSHA obtained during the almost the same DA window (from 5 to 13 June) for producing the analysis on 9 June 2017 (Fig. 9). The only difference between the 3dVar and a4dVar runs was the initial conditions: all other external forcing data for providing the surface and lateral boundary conditions were exactly the same. Since the surface forcing and lateral boundary conditions were created from the atmospheric (NCEP/NCAR) and oceanic (JCOPE2M) reanalysis products, the assimilation runs are referred as "hindcasts" that were examined in terms of predictability of the model trajectories initialized on 13 June 2017 by the ms3dVar and a4dVar DA techniques. Figure 10 compares RMSD of SSHA between the hindcast runs and observation data during the first 10-day period from 13 to 23 June 2019. Note that no SSHA data were obtained on 19 June. The a4dVar run shows significantly (3-5 cm) smaller SSHA RMSD within the first 4 days (June 13-17) than the ms3dVar hindcast run (green and blue lines in Fig. 10) and is at a comparable level with the ms3dVar DA product (red line). RMSDs of SST are quite of similar levels around 1°C and gradually increase for both a4dVar and ms3dVar runs as compared to a relatively stable RMSD variation of the ms3dVar product (not shown).
Validation using the independent TGSL data exhibits a significantly better a4dVar hindcast as compared to ms3dVar. Table 7 clearly demonstrates that the a4dVar run considerably outperforms the ms3dVar run for the hindcast periods from 10 to 30 days. Figure 11 a visualizes significant improvements in correlation by a4dVar at all the seven sites for the 30-day hindcast period. A typical example of the skill improvements at Hachijojima (HJ, see Fig. 1 for the location) is shown in Fig. 11b. The a4dVar run successfully represents a sea level rise on 13 July 2017, whereas the ms3dVar run fails to do it, as denoted by an arrow in Fig. 11b. This is because the a4dVar run successfully simulates a ridge of SSH at the east side of the Kuroshio meander around HJ (Fig. 12b), while the ms3dVar run does not (Fig. 12a). The difference in temperature and velocities at 200 m between the a4dVar and ms3dVar runs (Fig. 12c) shows that a positive temperature and anticyclonic circulation anomaly (white arrow in Fig. 12c) is responsible for the prominent SSH ridge around HJ in the a4dVar Fig. 9 Calendar diagrams representing the hindcast runs initialized by a4dVar (blue) and ms3dVar (black) operations Fig. 10 RMSD evolution of the SSHA model-data misfit for various model products for the 10-day hindcast period. Blue, green, and red curves denote the runs initialized by the a4dVar, ms3dVar, and ms3dVar products, respectively  . 11 a Comparisons of the correlations between the modeled and observed sea level data at the seven tide gauge stations for the 30-day hindcast period (13 June-13 July 2017). Station locations are shown in Fig. 1. b Evolution of the sea level anomaly at Hachijojima (HJ). Blue, green, and black curves denote the run initialized by a4dVar, ms3dVar, and observation data, respectively Fig. 12 a A snapshot of SSH (color) on 13 July 2017 in the run initialized by ms3dVar. A blue curve denotes the Kuroshio axis position of the model. White color curves denote the weekly mean observed axis positions provided from the Japan Coast Guard. The duration of the weekly mean period was daily updated and then all position data including the target day in their weekly mean periods were plotted. A red circle indicates the Hachijojima tide gauge station. b Same as in a except for the run initialized by a4dVar. c Difference in temperature (color) and velocities (vectors) between the runs initialized by a4dVar and ms3dVar. A white arrow indicates the positive temperature anomaly associated with the observed increase of sea level at Hachijojima run. This anomaly originates from the positive temperature increment around 33°N, 137°E added to the first guess initial condition (white arrows in the right panels of Fig. 7). Note that RMSD in the latitudinal position of the meander (defined as the strongest SSH gradient positions that were detected among the grids with kinetic energy magnitudes larger than 0.4 m 2 /s 2 in a longitudinal range from 133 to 141°E, shown by the blue curves in Fig. 12 a and b) and the JCG Kuroshio reports (white curves in Fig. 12 a and b) is much larger (0.39°) for the ms3dVar run (Fig. 12a) as compared to the a4dVar (0.25°) run (Fig. 12b). Temporal mean RMSDs for the period from 13 June to 13 July 2017 are 0.37°and 0.29°f or the ms3dVar and a4dVar runs, respectively. Table 7 further indicates that the a4dVar hindcast demonstrates comparable or slightly better skills as compared to the ms3dVar DA and L4SSHA products for the hindcast periods of 10 to 30 days. In particular, the a4dVar run provides the best skill among all the products for the 10-day hindcast period, implying a superiority of the a4dVar scheme which clearly demonstrates a capability of dynamically consistent corrections to the initial condition.
The hindcast skill of the a4dVar run deteriorates for the 60day hindcast period as shown in Table 7. In this case, the Kuroshio meander merged with a cyclonic eddy around 30°N , 140.5°E in the last half of July 2017 causing a significant intensification of the meander at the beginning of August (not shown). Such episodic nonlinear interactions with the meander and the eddy affected the skills. Other simulations (e.g., initialized by a4dVar with C B = 5) demonstrated no such merger event (not shown), and the skills in those cases are better for the 60-day hindcast period than that in the present case. We assume that the Kuroshio path variations at hindcast times exceeding 1 month after the initialization are basically beyond of the predictability limit (Miyazawa et al. 2005).
Validation by the independent DFV data shows that the a4dVar-optimized run results in better skills for relatively longer validation periods than the ms3dVa-optimized run (Table 8). Figure 13 shows sampling positions of the surface drifter velocities during the 10-day (Fig. 13a) and 60-day (Fig. 13b) hindcast periods. Figure 13 suggests that the skill assessment could be affected by the sampling locations and timings. For most of the sampling positions which cover almost the entire model region, the a4dVar-optimized hindcast run exhibits a good enough skill comparable to those of the alternative data assimilation (ms3dVar and CMEMS) products (60-day length in Table 8 and Fig. 13b).
We compared temperature and salinity profiles of the hindcast runs with those independently observed during the same hindcast period. With increasing the target period length, RMSD with the observed data generally increases (not shown). Validation results for the 60-day hindcast length (Fig. 14) show that RMSDs of the a4dVar-optimized run are slightly larger than those of the ms3dVar-optimized run, while the RMSD amplitudes of both runs (Fig. 14) are quite similar to the RMS variances of the BEC (Fig. 4). Although during the 9-day DA window, the a4dVar optimization reduces RMSD of T/S data (Table 2), the a4dVar optimization does not much affect the reproducibility of temperature and salinity distributions during the following hindcast period. This is partly attributed to the fact that a4dVar correction of the initial conditions is mainly driven by improving the fit to SSHA data, although attraction to the T/ S data partly contributes to the overall cost reduction evaluated during the DA window period (Table 2).

Summary and discussion
This study demonstrated the successful implementation of the adjoint-free 4dVar scheme (Y16; Y17) in JCOPE operational ocean nowcast/forecast system originally based on a ms3dVar DA scheme (M17). It is shown that by correcting the first guess initial condition produced by ms3dVar, the a4dVar scheme improves the nowcast/forecast skills in representing the Kuroshio path variations south of Japan. Similar to 4dVar, the a4dVar is capable of reducing the model-data misfits within the accuracy of the measurement errors at right timing within the 9-day DA window used in the experiments.
We attribute the major improvements in fitting SSHA observations to inclusion of the information on the B-smoothed data misfits into the set of ensemble perturbations and the use of the balance operator in specifying the BEC structure. These a4dVar features allowed effective transformation of the information on the SSHA data misfit to the balanced perturbations of subsurface temperature/salinity and velocity fields which Since the a4dVar scheme requires neither a prescribed ensemble representing the background error statistics as in other ensemble-based 4dVar methods (e.g., Liu et al. 2008), nor the TLA codes for efficient estimation of the cost function gradient, it has been smoothly introduced into the established operational forecast system without significant modifications of the system software. The implementation of the a4dVar scheme leveraged the results available from the multi-scale 3dVar DA scheme (M17) which were used as the first guess to safely obtain the incremental improvements of the skills in addition to those delivered by the established operational system. Comparison of the wall time consumption by ms3dVar and a4dVar has shown that, similar to 4dVar, the a4dVar scheme is approximately 8 times more time consuming and requires 115 times more total CPU time. We estimate, however, that these numbers would have decreased at least four times if the ms3dVar analysis were executed at the same resolution as a4dVar, whose grid had 38 times more nodes than the ms3dVar analysis grid. We should also note that such unfavorable comparison would further relax with time due to the perpetual growth of the number of processor cores in the modern computational architectures. However, since the present study dealt only with a specific period of the Kuroshio variations, a comprehensive testing of the a4dVar scheme for other periods/cases awaits further exploration.
The a4dVar skills was found to be sensitive to the relative magnitudes of observation and background errors and the horizontal correlation scales used in the information of BEC, as indicated by Tables 3 and 4. The BEC formulation could be further generalized by including the flow-dependent/ Fig. 14 Profiles of horizontal averages of the RMSD of a temperature and b salinity calculated for the 60-day hindcast period from 13 June to 12 August 2017. Blue, green, and red curves denote the runs initialized by the a4dVar, ms3dVar, and ms3dVar products, respectively anisotropic correlations to further improve the skills of the a4dVar implementation. As an example, development of hybrid schemes (e.g., Yaremchuk et al. 2011) combining ensemble Kalman filter (e.g., Miyazawa et al. 2012) and a4dVar methodologies may be feasible in near future perspectives. In particular, the flow-dependent error covariance represented by the ensemble Kalman filter is effective for the assimilation of the high-resolution SST with relatively short-term variability ).
x 2 x T 1 x are unbalanced (ageostrophic) components of x 2 , and I 1 , I 2 are the identity matrices of respective sizes where V T;S; e U ; e V;ζ are the diagonal root mean square error variance matrices of T ; S; e U ; e V; ζ; C T ;S; e U ; e V;ζ are the respective correlation matrices; and β is the relative magnitude of the unbalanced components of e U ; e V; ζ. C T;S; e U ; e V;ζ could be modeled by the polynomial approximation C p or the exponential expression C e of the solution of a diffusion equation representing horizontal correlation of the target variables: where R o is the decorrelation radii and Δ is the horizontal Laplacian operator. The square root operators of C T;S; e U ; e V are conveniently represented as We used two values of the decorrelation radii: R o1 and R o2 , which were applied for TS (x 1 ) and u ! ζ (x 1 ) variables, respectively (R o1 > R o2 as in Y17). The square root of B and its inverse are represented as, Our implementation adopts the exponential form (C e ) for calculation of B and B 1/2 and utilizes the polynomial form (C P ) for calculation of B 1/2 (C −1=2 p ≅I− R 2 o 4 Δ ), because C e is not invertible numerically.

Appendix 3. Ensemble perturbations within the range of BEC
Consider a generalized eigenvalue problem for a sample covariance matrix XX T : where X is the I × J matrix of sample data, E is the I × J matrix of B-orthonormal eigenvectors, and Λ is the diagonal J × J matrix of the respective eigenvalues. The matrix of search directions S is defined by S = BE. The problem (27) is reduced to the standard eigenvalue problem, where X ′ = B −1/2 X, E ′ = B −1/2 E, and S = B 1/2 E ′ . Columns of the sample data matrix X i at iteration i are composed of the hybrid sequences, where x i t¼n is the model state at iteration i and time n, and is the B-smoothed data misfit. Extraction of the leading modes from the hybrid sequences (29) by solving (27) allows to produce the ensemble perturbations which include information combining the model dynamics (trajectory) and balanced error fields (29, 30) on the current a4dVar iteration.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.