1 Introduction

Nowadays the ocean is no longer seen as a passive agent being modulated by the chaotic, high-frequency atmospheric variability, but as an integrator that feedbacks onto the atmosphere and indeed provides memory (and predictability) to the system thanks to its longer persistence. The conceptual model by Hasselmann (1976; hereafter H76) of atmosphere-driven oceanic variability, while valid for most of the mid-latitude ocean surface, is questionable over regions characterized by intense mesoscale activity, e.g. the western boundary currents (Chelton et al. 2004; Minobe et al. 2008; Small et al. 2008; O’Neill et al. 2010; Putrasahan et al., 2013, Chelton and Xie 2015; Roberts et al. 2017c). Bishop et al. (2017; hereafter B17), based on theoretical arguments and observational evidence, identified two well-separated regimes by analysing the lead-lag covariance between SST and THF and between SST tendency and THF: an atmosphere-driven regime, dominating over open ocean regions (corresponding to H76’s view) and an ocean-driven regime prevailing over WBC areas.

Early attempts to analyse the nature of air-sea interaction in the context of coupled general circulation models made use of coarse resolution models, thereby neglecting a substantial fraction of the intrinsic variability within the oceanic and atmospheric sub-systems. As an example, von Storch (2000) analysed air–sea coupling in a long integration with a T21 atmospheric model coupled to a 4° large-scale geostrophic ocean model, and found a predominant role of the atmosphere as a driver of oceanic variability.

With the advent of high-resolution observational data and climate models, the coupling processes at the air-sea interface are being revisited. Kirtman et al. (2012) examined the impact of resolved ocean fronts and eddies on the simulation of large-scale climate in the NCAR CCSM3.5 model. By contrasting two different model configurations, a low-resolution 1° ocean and an eddy-resolving 0.1° ocean coupled to the same atmospheric model with 0.5° horizontal resolution, they found that local air-sea feedbacks are significantly modified by the increased ocean resolution. In particular, the explicit resolution of mesoscale oceanic features leads to a significant impact of SST variability on the air-sea coupling (SST forcing the atmosphere) over the extratropics, as opposed to the low-resolution ocean configuration that still shows a predominance of atmospheric forcing on the SST variability. A similar analysis conducted by Putrahasan et al. (2017) using the same model inspected in Kirtman et al. (2012), finds consistent indications of a strong imprint of eddy-induced SST anomalies on the air-sea fluxes over the Loop Current region, in the Gulf of Mexico. More recently, Small et al. (2019) extended the analysis of B17 to two configurations of the CESM1 climate model, at standard and high resolution, coupling the same 0.25° atmosphere to a 1° and 0.1° ocean model, respectively. Their findings are consistent with Kirtman et al. (2012) results, showing that over the WBC regions the standard resolution model is dominated by an atmosphere-driven regime, while the high-resolution (ocean eddy-resolving) model is dominated by an ocean-driven regime. Interestingly, in another single-model study, Roberts et al. (2016) find that realistic SST-heat flux correlations are already obtained using an eddy-permitting configuration of the Met Office climate model, with little further improvement when moving to an eddy-resolving version of the same model. Together, these analyses point to (either fully or partly resolved) mesoscale geostrophic turbulence in the ocean as a key feature in model simulations to properly reproduce the observed correlation between SST anomalies and turbulent heat fluxes at the air-sea interface.

These results suggest that model resolution (particularly for the ocean component) can play a crucial role in the representation of coupled ocean–atmosphere processes over oceanic regions characterized by baroclinically unstable current systems. However, to the authors’ knowledge, no systematic inspection of mid-latitude air-sea interaction in a multi-resolution multi-model context (i.e., in a hierarchy of model simulations performed at different spatial resolutions) has been performed so far. In this study the role of model resolution in the simulation of air-sea interaction over the eddy-rich Gulf Stream region is systematically assessed in a multi-model ensemble of present-climate global experiments performed following the common CMIP6 HighResMIP protocol (Haarsma et al. 2016). The HighResMIP framework provides the unprecedented opportunity of examining the representation of this highly scale-dependent phenomenon in a coordinated set of experiments conducted using different models at different spatial resolutions.

In Sect. 2 we describe the experimental setup, models and metrics for characterizing the ocean–atmosphere interaction. Section 3 provides indications on where to obtain the model output and on the observational product used in this study. Results are reported in Sect. 4 and further discussed in Sect. 5.

2 Methods

2.1 Experimental setup

In the present study, global coupled simulations from the EU-funded H2020 PRIMAVERA multi-model ensemble, following the HighResMIP (Haarsma et al. 2016) protocol, are used. HighResMIP is a CMIP6-endorsed model inter-comparison effort specifically designed to assess the role of model resolution in the representation of processes relevant to the climate system. The experimental design of HighResMIP consists of both atmosphere-only and coupled simulations, generally performed at standard (~ 100 km or coarser) and enhanced (~ 25 km) horizontal resolution in the atmosphere and the ocean. Here, 100-year coupled integrations of the present climate (referred to as “control-1950” in HighResMIP) are considered, forced with time-invariant radiative forcings (including greenhouse gases, ozone and aerosol loadings) representative of the 1950s. Each control-1950 integration is initialized in 1950 after a spin-up of 30 or 50 years. The spin-up is initialised from the ocean at rest with temperature and salinity values corresponding to 1950 from the EN4 dataset (Good et al. 2013). After the spin-up, models are further integrated for an additional 100 years. For each model, one single realization is provided. The use of single-member simulations is justified by the ergodic assumption, postulating that the time-mean behaviour of an individual model “trajectory” equals the ensemble mean applied to a multi-member set of realizations. The soundness of this assumption is granted by the use of control integrations performed under steady forcing conditions.

Six different coupled models have been analysed, 13 configurations in total. The corresponding horizontal resolutions of these configurations are summarized in Table 1, while a detailed documentation can be found in the following references: ECMWF-IFS, Roberts et al. (2018, 2020); CMCC-CM2, Cherchi et al. (2019), Scoccimarro et al. (2020); CNRM-CM6.1, Voldoire et al. (2019); MPI-ESM1-2, Gutjahr et al. (2019); EC-Earth3P, Haarsma et al. (2019); HadGEM3-GC3.1, Roberts et al. (2019).

Table 1 Summary of coupled models and their horizontal resolutions as used in PRIMAVERA project to complete the CMIP6 HighResMIP control-1950 experiments

Models contributing to HighResMIP use different ways to assess sensitivity to resolution. The model configurations analysed in this study essentially fall in two categories: enhanced resolution by increasing both ocean and atmosphere resolution (hereafter, group A); enhanced resolution by increasing only atmosphere resolution (hereafter, group B). HadGEM3-GC31 is an exception, since three configurations have been considered (LL, MM, HM): LL and MM stand for low-low resolution and medium-medium resolution, respectively, for the atmosphere–ocean components, while MM and HM share the same ocean but differ in the atmosphere resolution; therefore, the couplet LL and MM is included in group A, while MM and HM are in group B. Note that after this partition, models in group A share the property of featuring a low-resolution configuration with a “laminar” ocean component (i.e., with a spatial definition of ~ 100 km or coarser), while models in group B include a better resolved ocean (ranging from eddy-permitting—O(25 km)—up to 40 km nominal resolution). For each model, we retrieve monthly mean SST and THF fields, with positive THF indicating heat transfer out of the ocean into the atmosphere.

Some of the analyses presented in this study are also extended to a very high-resolution, mostly ocean eddy-resolving, configuration of the MPI-ESM model (MPI-ESM-ER; Gutjahr et al. 2019), run with a T127 atmosphere (1°) and a 0.1° (~ 10 km) ocean, under fixed 1950s forcing conditions (same as in the control-1950 simulations). This simulation is not fully compliant with the HighResMIP protocol, since MPI-ESM-ER model differs from HR and XR configurations not only for the resolution but also for the ocean physics parameterization. Specifically, MPI-ESM-ER model implements the ocean vertical mixing scheme of Pacanowski and Philander (1981) while HR and XR make use of the KPP (K-profile parameterization) scheme of Large et al. (1994). Due to this deviation from the common experimental protocol, results from the MPI-ESM-ER simulation are not strictly comparable to the other HighResMIP simulations. Despite this lack of compliance, analyses of MPI-ESM-ER model were included in our study since, through the comparison with the lower resolution MPI-ESM-HR and -XR companions, they provide some useful insight on the role of ocean mesoscale on mid-latitude air-sea interactions (see Supplemental Material).

2.2 Metrics: cross-covariance functions

In order to characterize the nature of air-sea interaction, the cross-covariance and correlation functions linking SST (and SST tendency) with turbulent (sensible plus latent) surface heat fluxes, calculated at different monthly time-lags, are used. By assessing whether there is any significant ocean–atmosphere co-variability, and determining the respective lead-lag time intervals, this approach (first suggested by Frankignoul and Hasselmann 1977) indicates whether SST fluctuations are driven by processes intrinsic to the atmosphere or to the ocean.

The physical interpretation of this diagnostic is well illustrated in Wu et al. (2006) and Bishop et al. (2017; see their Fig. 1a, b). These authors apply a cross-covariance analysis to a simple, two-equation stochastic energy balance model (EBM) designed to represent ocean–atmosphere interactions. Based on their analysis, when the SST variability is dominated by atmospheric weather, SST and THF are expected to be in quadrature (leading to an anti-symmetric lead-lag relationship), while the SST tendency is negatively correlated at the zero lag with THF anomalies, consistent with an upper ocean cooling driven by the release of heat from the ocean into the atmosphere. Alternatively, when the SST variability is primarily governed by processes intrinsic to the ocean, the SST-THF zero-lag correlation is positive, indicating a damping role of the turbulent heat fluxes on the SST anomalies generated by ocean dynamics (e.g., positive THF countering the warm SST anomalies associated with the heat convergence due to oceanic currents), while SST tendency and THF anomalies are in quadrature. Bishop et al. (2017) further extended the work of Wu et al. (2006) and showed a marked consistency between the theoretical EBM model results and observations.

Fig. 1
figure 1

SST-THF covariance maps in the North Atlantic (°C W m−2), computed for [− 1, 0, + 1]-month lag (top, mid, bottom row, respectively) for HighResMIP models and observations. SST leads for negative lags. Models in group A are shown. The location of the boxes used to evaluate correlations over the GS and the SG (shown in Figs. 5, 6, 7, 8) is indicated in the top-right panel (green square)

In the following, the functional shape of the lead-lag covariance and correlation between SST (and SST tendency) and turbulent heat flux is employed to identify the leading driver of SST fluctuations in a region encompassing both the Gulf Stream system and the North Atlantic subtropical basin. These diagnostics provide a multi-variate constraint to verify the fidelity of the scrutinized multi-model ensemble to the observed regimes of ocean–atmosphere interaction.

Time series of monthly-mean SST and THF anomalies (computed after removing the monthly climatological cycle) are used to calculate covariance patterns and correlations. SST tendencies are calculated using a centred-difference numerical approximation.

3 Data

The monthly mean model output used for this work is available on the Earth System Grid Federation (ESGF) nodes under references: HadGEM3-GC31 (Roberts 2017a, b, c), ECMWF-IFS (Roberts et al. 2017a, b), CNRM-CM6-1 (Voldoire 2019a, b), CMCC-CM2 (Scoccimarro et al. 2017a, b), EC-Earth3P (EC-Earth 2018, 2019), MPI-ESM1-2 (von Storch et al. 2017a, b).

Model results are compared against observational estimates. For THF, monthly mean data of latent and sensible heat flux from the Japanese Ocean Flux Datasets with Use of Remote Sensing Observations version 3 (J-OFURO3; Tomita et al. 2019) are used. J-OFURO3 is an updated version of the previously released J-OFURO (Kubota et al. 2002) and J-OFURO2 (Tomita et al. 2010) datasets, and is provided on a 0.25° resolution grid for the 1988–2013 period. SSTs are the ensemble median of multiple global SST products (Tomita et al. 2019). J-OFURO3 presents the added value of featuring a higher resolution compared to the Woods Hole Oceanographic Institution 1° OAFlux dataset (Yu et al. 2008) and a longer time coverage compared to the 0.25° SeaFlux (http://seaflux.org/) dataset, the latter covering the 1998–2007 period. Due to these shortcomings, we did not consider OAFlux and SeaFlux as verification datasets in this study (see B17 and Small et al. 2019 for a more in-depth evaluation of the observational uncertainty associated with air-sea flux datasets).

4 Results

4.1 Covariance patterns

Figures 1 and 2 show grid-point covariance of monthly mean SST and THF anomalies with a ± 1-month time-lag interval for model groups A and B, respectively. Observational estimates are also shown in each figure, for comparison. All models show a well distinguishable signature of the Gulf Stream (GS) pathway, marked by the highest covariance values, contrasting with the open ocean, subtropical gyre (SG) region, featuring substantially lower covariance. Along the GS axis, the observed lead-lag relationship between SST and THF is particularly well captured by high-resolution configurations, with covariances reaching their maximum at lag zero and exhibiting symmetrically lower values for − 1 and + 1 lags. This symmetric structure is not equally well represented in the low-resolution model versions, featuring a higher covariance at lag − 1 compared to lag + 1 (see Sect. 4.2 for a more in-depth analysis). Over the subtropical Atlantic, the observed anti-symmetric structure of the covariance pattern, switching polarity from positive (lag − 1) to negative (lag + 1), is reasonably well captured by the models. Looking more in detail at the impact of resolution, a number of robust features emerge.

Fig. 2
figure 2

Same as Fig. 1 but for group B

Group A (Fig. 1). Compared to low-resolution (first panel for each model), high-resolution configurations (second panel for each model) show a systematic enhancement of the covariance strength over the GS, with covariance values exceeding the observed range, particularly over the western segment of the GS front. On the other hand, the use of higher resolution substantially improves the spatial structure of the covariance patterns, producing a more realistic tilt of the GS signature (i.e., SW-NE oriented, versus the more zonally elongated pattern exhibited by low-resolution configurations) and yielding a more continuous meandering as compared to the apparent two-lobe structure in low resolution. Indeed, all low-resolution configurations display an unrealistic maximum east of Newfoundland, in the GS extension region, which is absent in the high-resolution counterparts as well as in observations. The latter is likely mirroring a well-known bias of eddy-parametrized ocean models in representing the path of the North Atlantic Current nearby the Northwest Corner off Newfoundland (Marzocchi et al. 2015).

Group B (Fig. 2). Compared to group A, this subset of models exhibits a much weaker sensitivity to enhanced atmosphere resolution, suggesting that the ocean resolution is responsible for the low-vs-high resolution impact detected in models from group A.

Next, we focus on the covariance between SST tendency and THF (Figs. 3, 4). For group A (Fig. 3), most of the considerations made for SST-THF covariance hold, including the qualitative agreement with the observed pattern, and the sensitivity of the covariance magnitude to resolution. However, the improved representation of the covariance spatial structure with higher resolution is even more evident. Low-resolution leads to a severe underestimation of the covariance amplitude and, particularly for HadGEM3-GC3 and CNRM-CM6.1 models, to a reduction in the zonal extent of the GS signature. Again, by contrasting group A (Fig. 3) with group B (Fig. 4) it is clear that the ocean component drives most of the changes associated with resolution enhancement.

Fig. 3
figure 3

As Fig. 1 but for SST tendency and THF (°C W m−2 month−1). SST tendency leads for negative lags

Fig. 4
figure 4

As Fig. 2 but for SST tendency and THF (°C W m−2 month−1). SST tendency leads for negative lags

At this stage, it is worth noting that even models using a “laminar” ocean component in the low-resolution configuration (group A), capture the main features of the theoretical and observed functional laws associated with the SST-/SST tendency-THF covariance (Wu et al. 2006; Bishop et al. 2017), both in the western boundary current (GS) and in the open ocean subtropical gyre regions. This result (further documented and discussed in the following sections) somewhat contradicts the assumption that an explicit representation of mesoscale oceanic eddies is mandatory in order to capture the ocean-driven regime in the baroclinically unstable GS region. This idea stems in turn from the assumption that stochastic forcing representing the oceanic “weather” in a 1-dimensional EBM for ocean–atmosphere interaction (the No term in Wu et al. 2006) is conceptually equivalent to the action exerted by explicitly-resolved mesoscale eddies in a fully coupled general circulation model (GCM). Based on this conjecture, a model using a “laminar” ocean component should not be able to represent the ocean-driven regime, and the atmosphere-driven regime would dominate. This is not the case in the multi-model multi-resolution set of experiments analysed in this study. A possible interpretation of this result may be given invoking simple scaling arguments applied to the terms contributing to the mixed layer heat budget. In regions where SST gradients and/or ocean currents are weak (i.e., far from WBCs), the ocean advection term will have a consistently low magnitude, and tendencies in the surface temperature will be largely controlled by turbulent heat fluxes, leading to a prevailing atmosphere-driven regime. The opposite happens over WBCs, where ocean advection will be larger due to the stronger SST gradients/ocean current velocities, leading to a different balance in the mixed layer with the tendency of SST being governed by the temperature advection term. The eddy-parameterised oceans are missing the largest magnitude gradients and velocities, but they still have some representation of the WBC where velocities and surface temperature gradients have a larger magnitude than in the open ocean, so it is not surprising that the ocean advection contributions to the local heat budget (and their variance) are locally increased in the GS region, even if they are not as large as observed. This aspect is further discussed in the following sections.

4.2 Lead-lag correlation

In the previous section, the spatial structure of the SST/SST tendency-THF covariance has been explored in the ± 1-month time-lag range. The analysis highlighted the ability of PRIMAVERA models, operating at substantially different resolutions, to reproduce two well distinguishable regimes: an ocean-driven regime, typical of the GS system, and an atmosphere-driven regime, dominating the SG region. Having established the models’ fidelity in simulating this observed fingerprint of the mid-latitude air-sea interaction, we now focus on a more detailed analysis targeting the two sub-regions (namely GS and SG). Specifically, the functional shape of the SST/SST tendency-THF correlation is analysed for the longer ± 10-month time-lag range, and compared to observations. For this analysis, correlations are evaluated as an average of grid-point values over two boxes centred around [41° N, 60° W] and [30° N,40° W], representative of the areas dominated by the ocean-driven (GS) and the atmosphere-driven (SG) regime, respectively (shown in Figs. 1, 2, 3, 4, upper right panel).

The simulated correlation patterns of SST-THF and SST tendency-THF over the GS and the SG are shown in Figs. 5, 6, 7, 8, partitioned according to model groups A and B. In order to account for the uncertainty affecting the magnitude of the observed correlations due to both data sparsity in the earlier decades of the record, and to the increasing input of satellite data with time, estimates based on the more recent 2002–2013 period from J-OFURO3 dataset are also shown.

Fig. 5
figure 5

SST-THF (blue) and SST tendency-THF (red) correlation in the Gulf Stream, for low-resolution (dotted) and high-resolution (solid thin) configurations of Group A models, and observations (solid thick line and squares are used for 1988–2013 and 2002–2013 J-OFURO3, respectively). SST/SST tendency leads (lags) THF for negative (positive) time-lags. For HadGEM3-GC31 model the low (LL) and intermediate (MM) resolution configurations are compared

Over the GS (Fig. 5, 6) models reproduce the expected symmetric (anti-symmetric) functional law characterising the lead-lag relation between SST (SST tendency) and THF, typical of the ocean-driven regime, with a varying degree of realism. In particular, some of the model configurations feature an SST-THF relationship which substantially deviates from the symmetric shape around the lag zero, postulated by the theoretical stochastic model (e.g., B17). This is particularly evident in group A models (Fig. 5), with low resolution versions showing a systematic misrepresentation of the zero-lag correlation maximum (not present in their corresponding high-resolution counterparts), whereas group B models (Fig. 6) exhibit a closer agreement with the shape of the observed pattern, regardless of the resolution. The sensitivity of this feature to model and grid resolution is quantified by introducing an ad hoc metric measuring the degree of symmetry of the SST-THF correlations over the GS, hereafter termed Symmetry Index (SI). The SI is calculated as the absolute value of the difference between correlations at lags − 1 and + 1, normalised by the zero-lag value (which is typically positive):

Fig. 6
figure 6

Same as Fig. 5 but for group B models. For HadGEM3-GC31 model the intermediate (MM) and high (HM) resolution configurations are compared

$$\left|R\left(\tau =-1\right)-R(\tau =+1)\right|/R\left(\tau =0\right)$$

where R is the SST-THF correlation over the GS box (as in Figs. 5, 6), and τ is the time-lag (Figs. 7, 8). Note that, according to this definition, high (low) SI values correspond to large (small) deviations from symmetry. For perfectly symmetric lead-lag correlations, SI equals zero. Based on this metric (Fig. 9) low-resolution model configurations show a systematically lower degree of symmetry (i.e., larger SI values) compared to their high-resolution counterparts (the MPI-ESM model being an exception since SI is virtually identical for HR and XR configurations). It is interesting to notice that models belonging to group A (i.e., models changing both ocean and atmosphere resolution from low to high resolution configuration: EC-Earth3P, ECMWF-IFS, CNRM-CM6, HadGEM-GC3-LL/MM) undergo the largest SI reduction (improved symmetry) from low to high resolution, while relatively smaller change (or no change, as for MPI-ESM) is found in group B models (i.e., model couplets sharing the same ocean resolution in both low and high resolution configurations: HadGEM-MM/HM, CMCC-CM2, MPI-ESM). This suggests that the under-representation of the eddy activity in the ocean models is the main cause behind the large deviations from symmetry detected in lead-lag SST-THF covariance and correlations (Figs. 1, 2, 5, 6). It is also worth mentioning that even observational estimates (J-OFURO3) show some degree of asymmetry. However, there is a clear improvement (lower SI) once the most recent 2002–2013 part of the record is considered (filled circle in Fig. 9). Interestingly, eddy-permitting configurations broadly align with the J-OFURO3 2002–2013 estimate.

Fig. 7
figure 7

As Fig. 5 but for the North Atlantic Subtropical Gyre. SST/SST tendency leads (lags) THF for negative (positive) time-lags

Fig. 8
figure 8

As Fig. 6 but for the North Atlantic Subtropical Gyre. SST/SST tendency leads (lags) THF for negative (positive) time-lags

Fig. 9
figure 9

Symmetry Index computed according to relation (1) for models and observations over the GS (non-dimensional; see text for details). For each model, low (high) resolution configurations are marked with “x” (square). Note that models are ordered according to groups A and B (left to right) with HadGEM-GC3(A) indicating the [LL, MM] couplet (included in group A) and HadGEM-GC3(B) indicating the [MM, HM] couplet (included in group B). For J-OFURO3, an open (filled) circle is used to indicate that the estimate is based on the 1988–2013 (2002–2013) record

What determines the pronounced SST-THF spurious asymmetries found in the laminar coupled models over the GS is unclear. A possible interpretation is that these are indicative of a contribution from the atmospheric forcing, somehow cross-breeding the purely ocean-driven regime. Bearing in mind that both (ocean- and atmosphere-driven) regimes coexist (see Sect. 4.3), oceanic variability in eddy-parametrized models may be not sufficiently strong to counter the atmospheric weather influence, leading to a deviation from the functional shape predicted by the theoretical model for purely ocean-driven regimes.

The degree of symmetry is not the only aspect impacted by resolution. Consistent with Sect. 4.1 findings, high-resolution models in group A display larger SST-THF correlations around the zero-lag, compared to low resolution. Group B models, instead, reveal a smaller sensitivity to the enhanced atmospheric grid resolution, pointing to the primary role of the ocean resolution in setting the magnitude of the correlations. The uncertainty affecting the observational estimate does not allow any firm conclusion about the impact of resolution on the realism of the correlation values. However, it is worth noticing that (except for CNRM-CM6) all eddy-permitting model configurations (from both A and B groups) feature a zero-lag correlation value which is highly consistent with the 2002–2013 J-OFURO3 estimate.

Over the SG (Figs. 7, 8) models reproduce the observed anti-symmetric (symmetric) functional law characterising the lead-lag relation between SST (SST tendency) and THF, typical of the atmosphere-driven regime. In this case, compared to observations, the SST-THF simulated correlations appear to be overly symmetric around lag zero, thus more consistent with the EBM results shown in B17. Also, compared to the GS case, the impact of resolution is generally less marked, with model groups A and B showing a largely consistent behaviour. This is expected, since the open ocean, SG region is much less affected by eddy dynamics, and therefore eddy-permitting oceans are not key for realistically reproducing the relationship between upper ocean temperatures and turbulent heat fluxes.

4.3 Spatial scale dependency: regime transition and critical length-scale

In this section, following a methodology devised in B17, the dependency of the lead-lag SST-THF and SST tendency-THF co-variability on spatial scale is inspected. The analysis focuses on the GS region, where the influence of (either parameterized or partially resolved) oceanic eddies is larger (see Sects. 4.1, 4.2). This is achieved by applying to the original SST and THF fields a box-car spatial filter with a progressively increasing degree of smoothing. After that, correlations at different time-lags are calculated for each smoothing factor, leading to a two-dimensional distribution of correlations represented as a function of time-lag and smoothing degree.

For illustrative purposes, the correlation patterns resulting from the application of this procedure to the observations are shown in Fig. 10. A clear transition from a symmetric (anti-symmetric) to an anti-symmetric (symmetric) pattern under increasing degree of spatial smoothing is featured by SST-THF (SST tendency-THF) lagged correlations. The detected transition reveals how the progressive filtering of the smaller spatial scales induced by the smoothing leads to a dampening of the originally dominant ocean-driven regime over the GS region, in favour of a more atmosphere-driven regime, which is approached after a sufficiently strong spatial smoothing.

Fig. 10
figure 10

Observed SST-THF (top) and SST tendency-THF (bottom) correlation for the Gulf Stream, as a function of time-lag and spatial smoothing length (degree). SST/SST tendency leads (lags) THF for negative (positive) time-lags

The same diagnostic is then systematically applied to the multi-model ensemble. Following the approach adopted for the covariance analysis (Sect. 4.1), model results are clustered according to groups A and B, so as to better characterize the relative impact of ocean and atmosphere resolution (Figs. 11, 12, 13, 14). All models seem to be able to qualitatively represent the regime transition found in the observations (Fig. 10). By contrasting group A with group B, changes in ocean resolution appear to be more impactful than changes in atmosphere resolution alone, consistent with the results presented in Sect. 4.1. Following the criterion adopted in B17, the transition length-scale (LC) is quantified as the smoothing length where the SST-THF (rTQ) and SST tendency-THF (rTtQ) correlations intersect at the zero lag. This is formally expressed as follows:

Fig. 11
figure 11

SST-THF correlation for the Gulf Stream, as a function of time-lag and spatial smoothing length (degrees) for Group A models. SST leads (lags) THF for negative (positive) time-lags

Fig. 12
figure 12

Same as Fig. 11 but for Group B

Fig. 13
figure 13

SST tendency-THF correlation for the Gulf Stream, as a function of time-lag and spatial smoothing length (degrees) for Group A models. SST tendency leads (lags) THF for negative (positive) time-lags

Fig. 14
figure 14

Same as Fig. 13 but for Group B

$${L}_{c}:{\overline{{r}_{TQ}(x,y,0)}}^{{L}_{c }}=\left|{\overline{{r}_{{T}_{t}Q}(x,y,0)}}^{{L}_{c}}\right|$$

where the overbar denotes the box-car smoothing operator. Note that the cross-correlation term on the right-hand side is taken in absolute value, since it is usually negative. This metric provides a well constrained definition of LC accounting for the regime transition as diagnosed from both SST-THF and SST tendency-THF correlations.

Following the definition in (2), the length-scale LC is diagnosed for models and observations, in two steps. First, a 4th-order polynomial fit to rTQ and |rTtQ| data is obtained. Then, a numerical approximation of LC is derived using a Newton–Raphson algorithm to identify the root of nonlinear Eq. (2). This procedure is graphically illustrated in Fig. 15 (left panel). Here, model estimates appear to be scattered within the 3°-9° interval (but note that the majority of the models cluster below ~ 6°), thus systematically exceeding the observed ~ 3° estimate. This result reflects the distinct structure of simulated rTQ and rTtQ functions, compared to their observational counterparts. Specifically, simulated cross-correlation functions show a decorrelation spatial scale (for increasing smoothing factor) that typically exceeds the one in observations. This bias is particularly noticeable in the rTQ term (blue thin curves), while the rTtQ term (red curves) shows a reduced inter-model spread, and a closer agreement with observations.

Fig. 15
figure 15

Left: Best fit 4th order polynomial curves for rTQ (blue) and |rTtQ| (red) for PRIMAVERA models (thin) and observations (thick). The intersection point for each [rTQ, |rTtQ|] couplet is indicated with empty and filled circles for standard and high resolution model configurations, respectively, while a triangle is used for HadGEM3-GC31-MM configuration. A star is used for observations. The abscissa of the intersection point identifies the transition length-scale Lc (degrees). Right: Transition lengths (degrees) diagnosed for PRIMAVERA models (open circles) and observations (star). Note that models on the horizontal axis are ordered according to groups A and B (left to right)

In order to assess the impact of resolution, model-based and observational LC estimates (corresponding to the abscissas of the intersection points in Fig. 15, left panel) are shown in Fig. 15, right panel. In general terms, the link between model resolution and the simulated transition length is uncertain. ECMWF-IFS, HadGEM3-GC3 and CNRM-CM6 display an unambiguous LC bias reduction under resolution enhancement, with the largest impact in HadGEM-GC3 determined by the ocean grid refinement (comparing LL-MM with MM-HM). The other models, however, are negatively impacted by the change in resolution. Regardless of the sign of the change, it is worth noticing that, on average, models belonging to group A display the largest effect of resolution enhancement, when compared to group B models.

5 Summary and discussion

The impact of model resolution on the representation of air-sea interaction in the mid-latitude North Atlantic has been systematically examined in a multi-model set of present climate simulations, performed following the CMIP6 HighResMIP common experimental protocol in the framework of the EU-H2020 PRIMAVERA project. The ensemble consists of six different models, featuring a nominal horizontal resolution (for ocean and atmosphere) ranging from 250 to 25 km. The key findings of this study are summarised below.

  • Despite the wide range of resolutions spanned, signatures of the observed statistical relationship linking SST and THF variability are found in all model configurations. Based on a set of statistical metrics it was possible to assess that coupled GCMs based on either “laminar” (eddy-parameterised) or eddy-permitting oceans are able to discriminate between an ocean-driven regime, dominating the region controlled by the Gulf Stream dynamics, and an atmosphere-driven regime, typical of the open ocean regions. This result suggests a possible role for non-eddy driven, larger scale, oceanic variability over the GS mimicking the stochastic forcing associated with “ocean weather” processes.

  • The benefits of enhanced resolution become evident when looking at the spatial structure and degree of symmetry of the simulated covariance patterns. The increased model resolution leads to a more realistic representation of SST-THF (and SST tendency-THF) covariance. Comparing model groups A and B, it is further inferred that the detected improvements can be largely ascribed to the oceanic component. Specifically, increasing the ocean model resolution from low (100 km) to high (25 km) has a beneficial impact on the tilt and overall shape of the GS jet signature, turning from a predominantly zonal to a more realistic SW-NE orientation. On the other hand, an analogous increase in the atmospheric resolution appears to play a relatively minor role. The degree of symmetry of SST-THF correlation/covariance around the zero-lag is also strongly affected by the enhanced resolution, with high-resolution configurations exhibiting a considerably higher consistency with the theoretical functional shapes postulated by EBMs.

  • Scale-dependency is qualitatively well reproduced by PRIMAVERA models, all featuring the observed regime transition (from ocean-driven to atmosphere-driven) for increasing levels of spatial filtering. The simulated transition length-scale generally exceeds the observational estimate (3°), and is largely clustered within the 3°-6° interval. No clear dependency on the model resolution is found, with three (out of six) models showing a bias reduction when the resolution is enhanced.

What makes the eddy-parametrised models able to reproduce the characteristics of an ocean-driven regime, remains an open question. The occurrence of a regime transition not only in eddy-permitting, but also in eddy-parameterised models reveals the existence of coarse-grained processes acting in laminar oceans which are capable to effectively mimic the ocean variability needed to generate the observed SST-THF covariance over the GS region. The improved simulation of regional scale features achieved with eddy-permitting ocean configurations impacts on the degree of realism of the pattern of air-sea interaction, but not on the fundamental character (either ocean- or atmosphere-driven) of the interaction itself. These findings are further supported by an analysis of SST-THF and SST tendency-THF correlation/co-variance applied to an eddy-resolving configuration of the MPI-ESM model (MPI-ESM-ER; Gutjahr et al. 2019), run with T127 atmosphere (1°) and a 0.1° ocean, under fixed 1950 forcing conditions. Results of this analysis (presented in the Supplemental Material, Fig. SM1-4) show very marginal differences with respect to the eddy-permitting counterparts of the same model (MPI-ESM-HR and –XR), in line with Roberts et al. (2016).

These results appear to be not consistent with the main conclusions of the single-model studies of Kirtman et al. (2012) and Small et al. (2019). These authors point to (explicitly resolved) mesoscale ocean eddies as a key element to faithfully represent air-sea interactions over WBCs (see Fig. 19 in Kirtman et al. 2012 and Fig. 3 in Small et al. 2019). This apparent inconsistency may be reconciled by invoking a role for Ekman transport in eddy-parameterised models. This is an atmosphere-driven process which is able to supply variability to the ocean surface without requiring the explicit representation of quasi-geostrophic eddies. Surface wind anomalies over regions characterised by sharp climatological SST gradients can lead to Ekman heat transport anomalies that ultimately affect the heat content variability in the upper ocean. Fluctuations in the heat advection associated with Ekman dynamics will manifest themselves as ocean internal variability, although strictly speaking this variability is driven by the atmosphere and not by an energy cascade process associated with the instability of an oceanic front. Based on this interpretation, the injection of Ekman-driven variability (associated with large-scale wind forcing) in laminar oceans might compensate for their lack of internally generated variability and explain the apparent (qualitative) agreement displayed by eddy-parametrised and eddy-permitting models. The same process must be clearly at work in eddy-permitting models too, but it is likely overshadowed by their more vigorous, eddy-driven internal variability (see Small et al. (2020) for a thorough analysis of the role of Ekman heat transport on the upper ocean heat budget in the CESM model).

Based on the above considerations, even coarsely resolved oceans could supply the overlying atmosphere with a surrogate of the stochastic forcing (the No term in the 1-dimensional EBM described in Wu et al. 2006), necessary to reproduce the observed covariance between SST and THF anomalies. The underlying assumption is that the spectral characteristics of the simulated ocean surface variability, as perceived by the atmospheric model through the coupling process, are sufficiently close to a white noise forcing, as typically postulated in EBM-based studies (Barsugli and Battisti 1998; Wu et al. 2006), even in the absence of explicitly resolved mesoscale oceanic eddies. This circumstance would justify the agreement, in statistical sense, found for air-sea interaction across the wide range of grid resolutions featured by the PRIMAVERA models.

The primary role of ocean resolution in improving the representation of air-sea interactions is consistent with other studies based on the PRIMAVERA multi-model ensemble, analyzing different climate-relevant processes (Docquier et al. 2019; Roberts et al. 2020) and corroborates the idea of a critical threshold in the ocean model resolution, roughly placed around the eddy-permitting (~ 25 km) range, leading to a step-change in the degree of realism of the simulated features.

An aspect that is not covered by this study relates to the downstream impacts of the biases affecting ocean–atmosphere co-variability patterns. Systematic errors in the representation of ocean-driven and atmosphere-driven regimes may have far reaching impacts on the mid-latitude atmospheric circulation of the Northern Hemisphere and, more specifically, over the European sector. For instance, simulating a narrow, spatially confined ocean-driven regime (consistent with the observational estimates) could be crucial to realistically reproduce the observed Gulf Stream-induced anchoring of rainfall patterns, surface wind convergence and vertical circulation in the free troposphere (Minobe et al. 2008). Addressing these specific aspects will require dedicated analyses that will be reported in a future work.

Finally, although the present study focuses on the Gulf Stream region, results from a global covariance analysis of SST-THF and SST tendency-THF performed with the ECMWF-IFS model reveal that qualitatively similar results hold over the Kuroshio Extension (KE), while major discrepancies emerge when comparing LR against HR over the Southern Ocean (not shown). Specifically, in line with Small et al. (2019), no significant covariance is found in the LR model, in striking contrast with the HR configuration featuring a well distinguishable signature of the Agulhas Return Current and the Brazil-Malvinas confluence, both regions characterised by a vigorous eddy activity. To conclude, and with all the caveats of a single-model evaluation, while there are indications of the ocean forcing the atmosphere over the main WBC systems of the Northern Hemisphere (GS and KE) in low-resolution models, there is no such feature over the spatially extensive and climatically-relevant Southern Ocean.