1 Introduction

Since the beginning of modern ocean observations with the Challenger in the 1870s, various geophysical and biochemical variables have been observed. A wide variety of ocean observation data has been accumulated since the beginning of water sampling using ships with the development of observation instruments such as Conductivity Temperature Depths (CTDs) and eXpendable Bathy Thrermographs (XBTs). The international collaboration in the World Ocean Circulation Experiment Program (WOCE) and the development of global observations using satellites in the 1990s, along with the beginning of the international Argo program (Argo Science Team 2001) in the 2000s, have led to a drastic increase in the amount of observed data and a steady decrease in spatiotemporal porosity. Nevertheless, the problem of spatiotemporal resolution persists, especially in subsurface oceanography. The number of high-accuracy observations is particularly low in areas of the ocean where the sea conditions are severe in the winter and where the sea is far from the general navigation routes and lands.

With respect to the quality of observed data, it has become possible to obtain high-precision data due to the improvement of sensor accuracy and development of observation platforms. On the other hand, there are a variety of data, ranging from highly accurate ship observation data to the developing sensors and automatic instruments with medium accuracy. In addition to incessant oceanographic observations, various approaches are needed to capture the dynamics of the ocean accurately at the basin scale. In this context, data synthesis can provide an answer to better utilize the wide variety of available observations.

Synthesis of oceanographic data has been attempted in various ways. For example, the World Ocean Atlas (Monterey and Levitus 1997) uses only observational data to produce map data in time and space by interpolation under statistical assumptions. Many of these approaches are based purely on observational data, which is effective in areas and periods of relatively high observational density. More recently, integrated data sets of temperature and salinity fields compiled by month using data from an Argo float have been published (Hosoda et al. 2008).

In the late 1990s, as computing power continued to improve, the synthesis of observational data on a global scale using numerical models also flourished (e.g., Sciller et al. 2013). These research and development efforts were based on statistical mathematics, control engineering, and meteorological "data assimilation" technology, which was already used for weather forecasting.

Data integration in the ocean field using numerical models can be divided into two main categories: those that assimilate observational data sequentially and primarily aim to provide initial values for relatively short term (days to months) prediction experiments, and those that search for the time evolution of a three-dimensional distribution close to the observational data as a smooth time-evolving arena for short-, medium-, and long-term (seasons to decades) predictions, dynamical analysis, and estimation of the time evolution of heat and mass transports (e.g., Sciller et al. 2013). The former is often called "ocean reanalysis", while, hereafter, we refer to the latter as ocean state estimation (Wunsch and Heimbach 2013). For the ocean state estimation, a method that emphasizes dynamical consistency is desirable, and dynamical interpolation of observed data should be considered (e.g., Stammer et al. 2002a).

Following CLIVAR (Climate and Ocean: Variability, Predictability and Change; http://www.clivar.org), which was launched in 1995, GODAE (Global Ocean Data Assimilation Experiment; https://www.godae.org) was launched in 1997 with the aim of accelerating research on global ocean data synthesis. This movement continues today through the CLIVAR GSOP (Global Synthesis and Observations Panel; http://www.clivar.org/panels-and-working-groups/gsop/gsop.php) and GODAE OceanView (https://www.godae-oceanview.org), which were followed by OceanPredict in 2019.

In recent years, global data integration studies that consider observational variables excluded from the forecast variables of numerical models have also been pursued. In particular, there is a discussion on how to refine the mapping of vertical mixing, which is essential for the dynamics of oceanic circulation, including meridional circulation, using observed information.

In this paper, we present the current status of the global ocean state estimation. The future approach to synthesize vertical-mixing observation data using data assimilation systems is also discussed. Chapter 2 introduces the current ocean state estimation, and Chapter 3 presents the optimization of the vertical diffusion coefficient using temperature and salinity data. Chapter 4 discusses the promising direction of data synthesis for vertical-mixing observations, and Chapter 5 summarizes future challenges.

2 Current ocean state estimation

Since the 2000s, several research groups have estimated the ocean state. This estimation often involves the application of a four-dimensional variational adjoint method (e.g., Sasaki 1970; Awaji et al. 2003; Wunsch and Heimbach 2007) or the Kalman smoother approach (e.g., Evensen and van Leeuwen 2000; Fukumori 2002). These operations typically require large computers and complex coding schemes. Consequently, there is a limited number of institutions conducting data synthesis studies on the ocean state estimation. Representative examples of a global-scale long-term state estimation are presented next. The ECCO consortium, led by NASA's Physical Oceanography, Modeling, and Cryosphere Programs (https://ecco-group.org/home.cgi) in the United States, was the first to successfully estimate the global ocean state over several decades, and it has been providing high quality data sets such as those reported by Stammer et al. (2002b) and Wuncsh and Heimbach (2013). This technology has also been transferred to the University of Hamburg, Germany, and separately developed as a German ECCO (G-ECCO), which has also significantly contributed to the global ocean state estimation (e.g., Köhl et al. 2012).

In Japan, the K7 consortium formed at JAMSTEC, Kyoto University, has been conducting long-term global ocean data synthesis studies since the early 2000s (Awaji et al. 2003; Masuda et al. 2003). In the 2010s, the consortium provided a synthesized dataset for climate research (Estimated STate of Ocean for Climate research: ESTOC; Osafune et al. 2015). This dataset is capable of successfully reproducing mid- and long-term changes in the deep ocean by applying an anomaly data assimilation for the full depth of the ocean. Climate change research targeting deep-water warming (e.g., Fukasawa et al. 2004) using this system represents a unique achievement that demonstrates the advantage of the state estimation (Masuda et al. 2010). ESTOC has also been used to assess the reliability of estimates of global deep-sea heat storage increases. It has significantly contributed to studies showing that increases in deep-sea heat storage represent 8–20% of the ocean surface (Kouketsu et al. 2011). Table 1 summarizes global ocean data synthesis efforts using smoother methods by updating a review article of Sciller et al. (2013).

Table 1 List of four-dimensional oceanic applications of smoother methods. This is an update to Sciller et al. (2013) and not a comprehensive list of all applications

The estimation of the long-term global ocean state is becoming possible. Therefore, the role of oceans in global changes should be further elucidated at this stage by comprehensively understanding the changes in the subsurface layers of the ocean. For this reason, the dynamics of the mid-deep ocean must be understood more accurately. To obtain more accurate estimations, expansion of the subsurface observation data such as the enhancement of repeat hydrography (http://www.go-ship.org), expansion of the automatic ascending drifting buoy array, Argo array (http://www.jcommops.org/board?t=Argo), and its extension to the deep sea (http://www.jamstec.go.jp/ARGO/deepninja) should be vital.

The performance of numerical models is also recognized as an important factor. Some research groups have successfully constructed state-of-the-art basin-scale state estimations of 1/6 − 1/10 degree horizontal resolution (SOSE: Mazloff et al. 2010; FORA-WNP30: Usui et al. 2017), leading to a new global state estimation. In addition, another promising line is the utilization of observation information that does not correspond to model variables, which has been difficult to integrate directly.

3 Ocean state estimation optimized by controlling vertical mixing

Vertical mixing, especially diapycnal diffusivity, is critical to determine the energetics in the global ocean in association with meridional overturning, heat, and mass budgets (e.g., Munk and Wunsch 1998). Global dissipations of about 2 TW are thought to be mostly compensated by internal wave power sources from tides and winds (e.g., Kunze 2017). Hence, mixing has a large intermittency in space and time (e.g., Kunze et al. 2006) and depends on the ocean state, sources, and bottom topography through wave dynamics (e.g., Hibiya et al. 2017). Among these, deep ocean mixing mainly provides a downward buoyancy flux to maintain global meridional circulation while main thermocline mixing is smaller by one order (e.g., Lumpkin and Speer 2007). Deep ocean mixing is, thus, an important factor in refining the ocean state estimation. In numerical models, it is common for diffusion coefficients to be treated as an external model parameter rather than a model variable.

In this context, it is important to determine if the amount of change in the results when the diffusion coefficient of the model changes is relevant, that is, if it is appropriate to use the diffusion coefficient as a control variable. The dependence of numerical calculations on the magnitude of diffusion coefficients has been studied since the early stage of ocean circulation model development (e.g., Bryan 1987; Cummins et al. 1990; Sasaki et al. 2012; Richards et al. 2012; Melet et al. 2013; Oka and Niwa 2013). Recently, Furue et al. (2015) and Jia et al. (2015) have examined in detail how different vertical diffusion coefficients for different oceans affect the representativity of the model. The results indicate that the application of vertical diffusion coefficients with spatiotemporal distribution effectively reproduces realistic ocean circulation fields in terms of reducing representativeness errors. Another study (Niwa and Hibiya 2004) evaluated the three-dimensional distribution of tidal mixing using a tidal model. All these studies have important implications for understanding the distribution of vertical mixing and the dynamics of ocean circulation.

Attempts have been made to optimize the diffusion coefficients at the ocean basin scale using general circulation models and conventional ocean observation data. Liu et al. (2012), based on the G-ECCO system, applied the four-dimensional variational adjoint method to optimize global vertical and horizontal diffusion coefficients as control variables using data such as temperature, salinity, and sea surface height. Consequently, the reproducibility (cost attenuation) of temperature, salinity, and sea surface height anomalies has been improved by 10–20%, and the mean sea level deviation has been improved by 45%. Liu et al. (2014) analyzed the geographic distribution of the model parameters for diffusion (Fig. 1) and proposed a new parameterization by focusing on their correlation with the seafloor topography. The effects of internal waves generated by surface wind and long-propagating waves from remote sources are implicitly excluded according to a strong-constraint formalism. These are practical examples that can be adapted to other models and help elucidate the dynamics related to diffusion.

Fig. 1
figure 1

Distribution of the estimated kgmskew, eddy-induced thickness advection parameter in m2 s−1, at 1160 m applied by Liu et al. (2014). This parameter represents the skewed part incorporated in an eddy-mixing scheme presented by Gent and McWilliams (1990) and Gent et al. (1995) (Eden et al. 2007). Black contours show the bottom depth H in m, and green contours show the barotropic stream function in Sv (Liu et al. 2014)

Toyoda et al. (2015) applied Green's function method (e.g., Menemenlis et al. 2005) to temperature and salinity data to blend several existing vertical diffusion schemes at optimal proportions. They assumed a simple linear coupling and obtained the optimal mixing ratio for vertical diffusion coefficients (Fig. 2). The mixed layer scheme was independent from this optimization procedure. Consequently, they improved the reproducibility of the water temperature distribution and circulation field, mainly in the deeper layers. Moreover, their method significantly reduced the degrees of freedom and, by adopting a Monte Carlo legal strategy, efficiently achieved optimization with a relatively small amount of computational resources.

Fig. 2
figure 2

Horizontally averaged vertical profiles of vertical diffusivity for various schemes. Toyoda et al. (2015) considers a linear combination of three different vertical diffusivity schemes: Type III of Tsujino et al. (2000) as one of the most skillful background vertical diffusivity (TJN; green), Gargett’s (1984) state-dependent scheme (GGT; yellow), and the Hasumi and Suginohara’s (1999) scheme for bottom-intensified vertical diffusivity (HSM; blue). Optimization result is obtained through the best linear combination of the three schemes (red) (Toyoda et al. 2015)

These results may be highly model-dependent as the dynamics represented by the diffusion coefficients may differ according to the model resolution and other factors. In this context, there are unique state estimations that implicitly revise vertical mixing in the model by controlling model errors with oceanic initial conditions (DeVries and Primeau 2011; DeVries and Holzer 2019). Regardless of the approach selected, a careful comparison and verification with direct vertical-mixing observations (e.g., Waterhouse et al. 2014) and field information on vertical mixing assessed from existing temperature and salinity data, for example from the global Argo array (Whalen et al. 2012), will be essential.

4 Data synthesis of vertical-mixing observations

The number of vertical-mixing observations is much lower than the data on temperature and salinity. In addition, vertical mixing has a remarkably high spatiotemporal variability. Although Waterhouse et al. (2014) compiled the available microstructure profiles to detect global patterns of diapycnal mixing, in general, it is difficult to construct a continuous map of mixing on the global scale solely from observations. The data synthesis experiments using a global-scale numerical model shown in the previous section used observation data of temperature, salinity, and sea surface height properties, but not vertical mixing when refining the model parameters (vertical diffusion coefficients). Thus, there are not data synthesis of vertical mixing observations. Here, we discuss technical issues unique to the synthesis of mixing data using numerical models and effective approaches to solve such issues.

Considering the same physical quantities for the variables calculated in the numerical model, such as water temperature and salinity, data synthesis is relatively easy to envision. The nudging method, one of the most simple and traditional approaches, could be used for this synthesis. In this method, the model variables are restored towards the observed values. Since water with its properties modified through this method is transported through advection and diffusion, this creates a continuous data set with a distribution close to the observed values. However, it is difficult to create a map of vertical mixing in the same manner because the turbulent energy dissipation rate obtained from observations is not calculated in most numerical models and it is not a conservative variable.

The optimization of model parameters (vertical diffusion coefficients) using water temperature, salinity, and sea surface height anomaly observations, as described above, can provide a realistic ocean state (dynamically self-consistent results) together with optimal parameters. It is a kind of data synthesis of observed information and a dynamical interpolation that considers a numerical model (available formalism or model equations).

If the use of vertical-mixing observation data is considered as an analogy to the use of temperature, salinity, and sea surface height observations, an approach, which optimizes parameters such as vertical diffusion coefficients along with temperature, salinity fields, and circulation fields as control variables using vertical-mixing observations, can be considered. In this context, the distribution of vertical diffusion coefficients obtained with information from observations, including vertical mixing, can represent the data synthesis of vertical mixing observations. The use of a data synthesis system in which the majority of the model representation errors are compensated by modifying vertical-mixing parameters will likely compromise the reliability of the diffusivity values, and there is a danger that the optimized vertical-mixing map cannot be applied to observational data synthesis. Therefore, a vertical mixed parameterization rooted in mechanics should be adopted to the fullest (e.g., St. Laurent et al. 2002), and the results of careful optimization to mitigate the representation errors of the model should be verified.

5 Discussion

With the expansion of vertical-mixing observations, Yasuda et al. (http://omix.aori.u-tokyo.ac.jp), who proposed the new academic area "ocean mixing", started to synthesize these observations. This synthesis is an attempt to refine the reproduction of the mid-deep ocean state by utilizing vertical-mixing observations. In this framework, the adjoint-based four-dimensional variational method was applied to estimate the ocean state and optimal spatial three-dimensional distribution of the vertical diffusivity by modifying parameters in parameterizations of tidal-induced vertical mixing based on outputs of a global barotropic tide model (St. Laurent et al. 2002; Hibiya et al. 2006). Hibiya et al. (2006) comprehensively includes remote sources mainly by bottom topographies.

St. Laurent et al. (2002) proposed a mixing scheme, which is widely used in OGCMs at present. Following their scheme, we represent the turbulent dissipation rate, \(\varepsilon\), as

$$\begin{array}{c}\varepsilon =\frac{q}{\rho }{E}_{g}\left(x,y\right)F\left(z\right),\end{array}$$
(1)

where q is the local dissipation efficiency, \(\rho\) is the reference density of seawater (kg m–3), \({E}_{g}\) (W m–2) is the rate of conversion of barotropic tidal energy into internal waves, and \(F\left(z\right)\) is a vertical distribution function that assumes \(\varepsilon\) decays exponentially away from the ocean bottom as

$$\begin{array}{c}F\left(z\right)=\frac{\mathrm{exp}\left(-\frac{H+z}{h}\right)}{\zeta \left(1-\mathrm{exp}\left(-\frac{H}{h}\right)\right)},\end{array}$$
(2)

where z is the vertical coordinate (positive upward), \(H\) is the bottom depth, and \(h\) is the vertical decay scale. Vertical diffusivity, κ, is calculated following the Osborn (1980) relationship for the mechanical energy budget of turbulence as κ \(=\Gamma \varepsilon {N}^{-2}\), where \(\Gamma\) is the mixing efficiency of turbulence and \(N\) is the buoyancy frequency. While St. Laurent et al. (2002) assumed q was a constant, Tanaka et al. (2010) showed that q can take different values for subinertial and superinertial tidal frequencies, which are referred to as \({q}_{\mathrm{sub}}\) and \({q}_{\mathrm{sup}}\), respectively.

Osafune et al. (2014) calculated the adjoint sensitivity of these parameters in addition to some other parameters using turbulent dissipation rate observations in the North Pacific sections. Then the adjoint sensitivity was used as a clue to optimize the parameters (Fig. 3). The sensitivity here is a statistically evaluated variable, and it indicates that the difference between the observed data and numerical model results can be resolved by changing the vertical diffusivity. It is an approach to synthesize vertical-mixing observations.

Fig. 3
figure 3

Example of adjoint signal distribution at a 3000-m depth in the ongoing ESTOC for the case that includes turbulent dissipation rate observations. Red shades indicate the mixing parameter values should be further reduced, and blue indicates they should be further increased to reduce the difference between observation and state estimation for water temperature and salinity. Parameters are obtained from St. Laurent et al. (2002). (Osafune, personal communication)

The dataset obtained as a result of optimization will provide new insights for various scientific issues on the ocean interior, which have attracted attention in recent years such as the identification of the mechanisms of deep ocean climate change and changes in the meridional circulation, including deep-water warming. In addition, the distribution of optimized vertical mixing, although it contains model representativeness errors, should be validated as a single mapping result that synthesizes observations. The continuous research along this line with sustainable observations will lead to breakthroughs that shall elucidate the influence of vertical mixing in ocean circulation and global change.

In addition, the construction of a new numerical model framework in which the variables on turbulence are used as model variables may become possible in the near future due to advances in computer science. A wide range of direct observation data is expected to be densely acquired through turbulence observations using platforms such as Argo floats and underwater gliders, which are now entering the practical stage.