Deep mixed ocean volume in the Labrador Sea in HighResMIP models

Simulations from seven global coupled climate models performed at high and standard resolution as part of the high resolution model intercomparison project (HighResMIP) are analyzed to study deep ocean mixing in the Labrador Sea and the impact of increased horizontal resolution. The representation of convection varies strongly among models. Compared to observations from ARGO-floats and the EN4 data set, most models substantially overestimate deep convection in the Labrador Sea. In four out of five models, all four using the NEMO-ocean model, increasing the ocean resolution from 1° to 1/4° leads to increased deep mixing in the Labrador Sea. Increasing the atmospheric resolution has a smaller effect than increasing the ocean resolution. Simulated convection in the Labrador Sea is mainly governed by the release of heat from the ocean to the atmosphere and by the vertical stratification of the water masses in the Labrador Sea in late autumn. Models with stronger sub-polar gyre circulation have generally higher surface salinity in the Labrador Sea and a deeper convection. While the high-resolution models show more realistic ocean stratification in the Labrador Sea than the standard resolution models, they generally overestimate the convection. The results indicate that the representation of sub-grid scale mixing processes might be imperfect in the models and contribute to the biases in deep convection. Since in more than half of the models, the Labrador Sea convection is important for the Atlantic Meridional Overturning Circulation (AMOC), this raises questions about the future behavior of the AMOC in the models.


Introduction
Open-ocean deep convection is a rare phenomenon, occurring only at a few locations in the world's ocean. It provides a vertical link between properties of the surface ocean and the deep ocean. In the North Atlantic, the northward flowing warm and salty water masses become denser because of large heat loss to the atmosphere and sink into the deep ocean. Deep convection ventilates the deep ocean with oxygen and plays an important role for the storage of carbon and heat.
The Labrador Sea is one of the main convection sites in the North Atlantic. Deep convection in the Labrador Sea and in the Irminger Seas (de Jong and de Steur 2016;de Jong et al. 2018) produce the deep water masses called Labrador Sea Water (LSW) (Clarke and Gascard 1983), which together with the dense overflow waters coming through Faroe Bank Channel and Denmark Strait (Dickson and Brown 1994) and originating from convection in the Greenland-Iceland-Norwegian Seas (GIN-Sea, Ronski and Budeus 2005) form the North Atlantic Deep Water.
The deep convection in the Labrador Sea is mainly driven by wintertime buoyancy loss to the atmosphere (Latif et al. 2006;Frankignoul et al. 2009), which is strongly governed by the North Atlantic Oscillation (NAO). Freshwater transports through Fram Strait (Holland et al. 2001;Jungclaus et al. 2005;Koenigk et al. 2006) and Denmark Strait (Ortega et al. 2017) can modify local density stratification as well and thus contribute to the variability of deep convection in the Labrador Sea. Labrador Sea convection varies strongly on interannual to decadal time scales (Yashayaev and Loder 2016) and was rather shallow in the 2000s, but recovered in recent years with mixing depths exceeding 2000 m (Yashayaev and Loder 2017).
Variability and change of deep convection affect local and remote climate. Reduced convection in the Labrador Sea is linked to lower surface salinity and temperature and more sea ice in the Labrador Sea and Davis Strait (Deser et al. 2002) but it has in addition remote effects on the atmospheric circulation (Koenigk et al. 2006). Deep convection and deep water formation, particularly in the Labrador Sea, have been seen for a long time as an important mechanism for the Atlantic Meridional Overturning Circulation (AMOC), which is fundamentally important for the climate in the North Atlantic and adjacent regions such as Western Europe and the Arctic (Manabe and Stouffer 1999;Mahajan et al. 2011;Koenigk et al. 2012;Jackson et al. 2015). Recent studies from Smeed et al. (2014Smeed et al. ( , 2018 indicate a weakening of the AMOC at 26.5 °N after year 2008. However, it is unclear if this observed weakening is caused by climate change or internal decadal variability (e.g. Roberts et al. 2014;Jackson et al. 2016;Robson et al. 2016). Caesar et al. (2018) used spatial patterns of sea surface temperature as AMOC indicator and suggested that the AMOC has been reduced since the mid-twentieth century, and Thornally et al. (2018) showed based on paleooceanographic data that the AMOC has been in a low state for the last 150 years. Many modelling studies have discussed the potential for a weakening AMOC as a response to global warming (Swingedouw et al. 2007;Cheng et al. 2013;Brodeau and Koenigk 2016;Koenigk and Brodeau 2017) and linked the weakening to a reduction of the deep water formation (Latif et al. 2006;Deshayes et al. 2007;Koenigk et al. 2007;Frankignoul et al. 2009;Langehaug et al. 2012).
However, more recently, the importance of deep convection for the AMOC has been questioned (Lozier et al. 2017(Lozier et al. , 2019Sayol et al. 2019). Kuhlbrodt et al. (2007) and Medhaug and Furevik (2011) identified wind-driven upwelling, gyre circulation, and wind and tidal vertical mixing as important processes sustaining the long-term strength of the AMOC, so that a collapse of the deep convection would not necessarily lead to a collapse of the AMOC (Gelderloos et al. 2012;Marotzke and Scott 1999). Menary et al. (2020) showed that the generation of density anomalies in the Irminger Sea, which propagate into the Labrador Sea is of importance. However, still short observational periods and potential lags of several years between AMOC and convection (Roberts et al. 2013;Brodeau and Koenigk 2016) make robust conclusions on the linkage between Labrador Sea convection and AMOC from observations difficult.
The deep convective process is temporally intermittent and spatially compact. This makes this process difficult to observe, and state-of-the-art climate models, such as models participating in the Coupled Model Intercomparison Project, Phase 6 (CMIP6), need to use parameterizations to represent convective processes (Fox-Kemper et al. 2019). Given the importance of the deep convection for climate and its future change, a reliable representation in climate models is highly important. However, Heuzé (2017) found that "the majority of CMIP5 models convect too deeply, over too large an area, too often and too far south". Although CMIP6 models seem to improve the representation of bottom waters, more improvements are required (Heuzé 2020).
Going beyond the horizontal resolution of the standard resolution CMIP6 models, we analyse in this study whether high-resolution models from the CMIP6 High-Resolution Model Intercomparison Project (HighResMIP, Haarsma et al. 2016) improve the representation of deep convection in the Labrador Sea. We use simulations from seven models participating in HighResMIP, which have been undertaken in the EU-H2020-project PRIMAVERA. Most of the highresolution model versions of these seven models have ocean grid resolutions of around 10-25 km in the Labrador Sea. Since the baroclinic Rossby radius of deformation is small in high latitudes (below 10 km in polar regions and 200 km in the tropics), mesoscale oceanic eddies are not resolved in the sub-polar convection regions in most of the HighResMIP models. We thus denote them as "eddy permitting" but not "eddy-resolving".
Recent studies found that the increased resolution in the HighResMIP models improved many aspects of the ocean including temperature and salinity biases , the northward ocean heat transport (Grist et al. 2018), sea ice in the Atlantic sector of the Arctic Ocean (Docquier et al. 2019, the position of the North Atlantic Current (Sein et al. 2018) as well as Arctic freshwater exports (Fuentes Franco and Koenigk 2019), all of them with the potential to improve the representation of Labrador Sea convection.
After this introduction, we proceed with describing the models, the data and the methods in Sect. 2. Sections 3 to 5 then show the results from this study and we conclude in Sect. 6.

Models and simulations
In this study we analyze seven global coupled climate models (see e.g. Vannière et al. 2019;Roberts et al. 2020), which contributed to HighResMIP (Haarsma et al. 2016) within the H2020-EU-project PRIMAVERA. These models are ECMWF-IFS   (Sidorenko et al. 2014;HR and LR setups: Sein et al. 2016) and EC-Earth3P (Haarsma et al. 2020). The set of HighResMIP experiments is divided into three tiers consisting of atmosphere-only and coupled runs, which span the period 1950-2050 (details in Haarsma et al. 2016). Here, we use the Tier 2 historical coupled simulations from 1950-2014 and the 100-year control simulations from these seven models for our analysis. The control runs used fixed 1950s forcing (greenhouse gases, including ozone and aerosol loading for a 1950s (~ 10 year mean) climatology). Both control and historical runs are initialized from the end of 30-50-year spin-up simulations, which were initialized using 1950-ocean conditions of the EN4 data set (Good et al. 2013), using 1950s-forcing. The 30-50-year spin-up simulations are not long enough to bring the models into equilibrium but they are sufficient to substantially reduce drifts in surface variables such as sea surface temperature and sea ice concentration. However, the deeper ocean might still show drifts throughout the entire historical simulation and these drifts might affect deep convection in the North Atlantic. The control runs are used to evaluate potential model drifts in the historical simulations.
All models performed historical and control simulations in at least two different resolutions. Changes in oceanic and atmospheric tuning parameters are kept to a minimum between low-and high-resolution simulations. All models use the same vertical ocean mixing scheme in their lowand high-resolution versions (Turbulent Kinetic Energy (TKE) mixing scheme or K-profile mixing scheme (KPP), see Table 1). Three models with changing ocean resolution use Gent-McWilliams (GM) parameterization of ocean eddies in their low-resolution ocean version but not in the high-resolution versions (HadGEM-3-GC31, ECMWF-IFS, AWI-CM.1-0) while CNRM-CM6.1 and EC-Earth3P use GM-parameterization in both low and high ocean resolution. The parameter values for e.g. horizontal eddy diffusivity and viscosity, iso-neutral diffusion and eddy-induced velocity coefficient vary with the horizontal resolution in the models (see Table 1 in Roberts et al. 2020). As will be shown in the result section, this does not seem to strongly affect the change of deep convection with resolution and the differences between high-and low-resolution model versions can mainly be attributed to the change in resolution.
The resolution varies among the models. A few of the models change both ocean and atmosphere resolution at the same time while others separately change ocean or atmosphere resolution. This allows us to analyze also the effect of increasing the resolution in only one component of the system. Five of the seven models use the NEMO-ocean model as ocean component. Although the NEMO-model configurations differ quite substantially from each other with different sea ice models (LIM2, LIM3, GELATO, CICE) and differences in parameterizations (e.g. mesoscale eddy closure), we acknowledge that the results we present on the effect of increasing resolution are skewed towards NEMO-models with 1° and 1/4° horizontal resolution. AWI-CM-1-0 and MPI-ESM1.2 use the same atmosphere component but different ocean components.
More details on the models and the simulations used in this study are provided in Table 1 and in Roberts et al. (2020).

Observational data
As a proxy for convection in the Labrador Sea, we use the simulated mixed layer depth (MLD) and compare it with observations from ARGO-profiles (Holte et al. 2017), which are provided on a 1° grid. Two different ARGO data sets of the MLD are existing and are used in this study: first, the climatological mean MLD in each grid point in March There exist some limitations of the ARGO data. First, only a few ARGO-profiles exist in many grid-points and in some no profiles at all. Second, ARGO-floats generally sample to a depth of 2000 m, thus MLD extending below 2000 m are not captured. Finally, the observational time period is relatively short given the long time scales of variability in deep convection. Thus, the ARGO-observations provide only an estimate of the averaged real world MLD for the period 2000-2015.
In addition to ARGO, we use ocean temperature (T) and salinity (S) from the EN4-data set for the period 1950-2014 (Good et al. 2013). EN4 uses optimal interpolation and relaxation to climatology on fixed horizontal and vertical length scales. For periods with sparse observations, this method might lead to substantial uncertainties. Thus, results from EN4 must be interpreted with cautious as well, especially before year 2000, when no ARGO data were available as input for EN4. However, Jackson et al. (2020) showed that temperature and salinity differences between earlier periods of EN4 and the period after 2000 are relatively small compared to biases in the models. Given the large decadal to multi-decadal variability of ocean properties in the sub-polar North Atlantic, the 65-year time period from EN4 is useful for comparison to the model results in addition to the comparison to the 15-year ARGO-data. Further, EN4 data are more comparable with models both in temporal and spatial scales, as both come with gridded data of much coarser resolution than the ARGO profiles, and are resolved at monthly time-scales. We calculate the MLD from EN4 based on the monthly mean values of ocean temperature and use a MLD threshold criteria of ΔT = 0.5 K following Obata et al. (1996). Here, we use only the temperature to calculate the MLD since the temperature values are more reliable than the salinity values in EN4. The T criterion that we apply, generally corresponds to the often used density criterion Δρ = 0.125 kg m −3 (e.g. Monterey and Levitus 1997). However, depending on the criterion and the observational data set that is used, the MLD may vary (Toyoda et al. 2017). In addition, the use of monthly averages compared to individual profiles as in ARGO MLD data leads to uncertainties (Toyoda et al. 2017).
To evaluate the surface heat fluxes in our models, we use monthly mean turbulent latent and sensible heat fluxes from

Method
Several different indices have been defined for the deep convection in the ocean (e.g. Koenigk et al. 2007;Schott et al. 2009;Yashayaev and Loder 2009;L'Hévéder et al. 2012;Lavergne et al. 2014). These indices consider either the maximum MLD and/ or the horizontal extent of the MLD. However, none of them excludes convective events that are too shallow to contribute to deep water formation. To overcome this problem, Brodeau and Koenigk (2016) defined the so-called "Deep Mixed Volume" (DMV) index, which only considers the convective mixing below a specific depth (critical depth z crit ) and integrates the volume of these deep mixed water masses in different convection regions of the North Atlantic. In our study, we use the DMV index for monitoring the deep convection using a critical depth of 1000 m for the Labrador Sea (Brodeau and Koenigk 2016). In the Labrador Sea, convection must reach a depth of around 1000 m in order to renew the LSW and contribute to the North Atlantic Deep Water (Yashayaev 2007). We define the Labrador Sea region as 70° W-40° W, 45° N-72° N, which covers the main Labrador Sea convection area in all seven models. We use monthly mean values of the March MLD of the model simulations to calculate the DMV. Short convection episodes that exceed z crit might thus be missed. The ocean mixed layer thickness in the models (variable mlotst following CMIP6-conventions is used) is defined by the depth at which a change from the surface sigma-t of 0.125 kg/m 3 has occurred (sigma-t criterion, Levitus 1982). This definition differs from the MLD criterion that we use for EN4 data and from the one used in the ARGO-data (compare Sect. 2.2). For additional comparison, we therefore use the ΔT = 0.5 K criterion to calculate the March MLD from monthly mean data of the models.
We calculate the DMV from the ARGO data and EN4 data to compare with the model results. For ARGO, we infilled grid-points with missing data by interpolating the nearest neighbours. To calculate the DMV with z crit = 1000 m from ARGO, we use the maximum MLD since usage of climatological MLDs leads likely to an underestimation of the DMV compared to using time-varying data. However, since only few profiles per grid point are available in ARGO, the maximum MLD is often not much larger than the climatological MLD. These issues and those already discussed in Sect. 2.2 make the ARGO estimates for the DMV uncertain. Similarly, the DMV calculated from EN4 data is uncertain as well, and thus the DMV from both observational data sources provide only a rough estimate for the real world DMV. The difference in DMV between ARGO and EN4 might be interpreted as observational uncertainty of the DMV. The DMV in EN4 in the Labrador Sea in March is 2.45 times larger than the DMV in ARGO in the period 2000-2014. While this is a huge uncertainty, the differences between DMV in many of the models and in ARGO and EN4 is much larger as we will show in Sect. 3.2.
As an additional comparison, we calculate the mean DMV based on critical depths of 0 m, thus considering the full mixed layer. For DMV with z crit = 0 m, the climatological MLD from ARGO has been used.
To calculate the density, we follow the definition of Millero and Poisson (1981). For the calculation of the mean density, temperature and salinity, averaged over the Labrador Sea, we exclude all grid points with a salinity below 30 psu to avoid the effect of low surface salinities near coastlines and ice edges on the average values.
For correlations, we calculate the Pearson correlation coefficient (r). We call a correlation significantly different from 0, if the p-value is 0.05 or smaller based on a two-sided student-t distribution.
To calculate the liquid freshwater transport through the Arctic straits, we use the model grid lines on the native grids of the models that are closest to the geographical landmarks that define Fram Strait (across 78° N), Northern Baffin Bay (78° N) and Denmark Strait (66° N). The freshwater is defined as the amount of zero-salinity water required to reach the observed salinity of a seawater sample starting from a reference salinity. Specifically, liquid freshwater transport (fwt, in m 3 /s) is estimated as for salinity S (in practical salinity units) and velocity v (in m/s) perpendicular to the section. As reference salinity Sref we use 34.8 psu for all models. The integration along z is performed from the bottom at depth D to the sea surface at height η (in this case η = 0). p1 and p2 are the landmarks and the integration is done considering dx as the length (dz as depth) between every grid point. The solid freshwater transport is calculated from the sea ice transports across the sections assuming a constant ice salinity of 5 psu and Sref of 34.8 psu.

Deep convection
This section shows first the analysis of the MLD in March, the month with the strongest convection in both observations and models, in the North Atlantic in the different models and in ARGO and EN4. Then, we focus on the DMV in the Labrador Sea, potential trends in the historical simulations and differences between models. Figure 1 shows the average MLD in March from ARGO, EN4 and all historical model simulations. In the period 2000-2015, the ARGO data suggest average MLDs of about 1000 m in the Labrador and GIN Seas (Fig. 1a). The EN4 data show a similar distribution of the MLD in the North Atlantic as ARGO (Fig. 1b). In the models, the MLD differs greatly and shows a strong dependence on the spatial resolution. While ECMWF-IFS-LR and EC-Earth3P show no or very shallow MLD in the main convection areas of the North Atlantic, many of the other simulations strongly overestimate the MLD compared to the observations, especially in the Labrador Sea. The models with particularly deep mixed layers show the largest horizontal extension of convection.

Mixed layer depth in the North Atlantic
Thus, compared to ARGO and EN4, they do not only overestimate the depth of mixed layer but also the area of deep convection. Some models simulate much too strong convection in the Labrador Sea but do not show any deep convection in the GIN Seas while other models overestimate the MLD in both seas. In contrast, in the Irminger Sea, the MLD is more consistent across models and agrees better with ARGO and EN4. Note that we here compare models' MLD averaged over 1950-2014 with ARGO-data from 2000-2015; MLD in EN4 is based on 1950-2014 as in the models. The comparison of MLD across models and between models and observations might depend on the definition of MLD.
To analyse this in more detail, we used March mean values of temperature from the model simulations to calculate the mixed layer depth based on the ΔT = 0.5 K criterion (as has been used for MLD in EN4). As expected, the MLD in the models changes based on the definition. Averaged over all models, the MLD based on the ΔT = 0.5 K criterion leads to deeper MLDs, but the change in MLD differs substantially between the models (compare supplementary Figure  S1 with Fig. 1). Particularly, in the ECMWF-IFS, MPI-ESM and EC-Earth3P-HR simulations, MLD is deeper while only small changes occur in the HadGEM3-GC31, CMCC-CM2 and AWI-CM1 simulations. Also, the spatial distribution depends partly on the MLD criterion. Particularly, along the southeastern tip of Greenland MLD is increased when using the ΔT = 0.5 K criterion. As discussed in Sect. 2.2, the realism of the EN4-data depends critically on the available observations and might change over time. Supplementary Figure S2 shows the MLD in EN4 in different 15-year periods between 1950 and 2014. Despite MLD variations between the periods, the biases in the models are substantially larger than the variations in EN4. At least parts of the MLD differences between different 15-year periods in EN4 are likely due to internal climate variability as a comparison to variability in the models between different 15-year periods reveals (supplementary Figures S3 and S4).
The MLD in the Labrador Sea deepens with increasing ocean resolution in all models (ECMWF-IFS, HadGEM3-GC31, CNRM-CM6-1, EC-Earth3P), except for AWI-CM-1-0, independent of the MLD-criterion (Figs. 1 and S1). The models showing deepening MLDs share NEMO as the ocean component, whereas AWI-CM-1-0 has FESOM as its ocean component. On the other hand, even the models with NEMO3.6 as their ocean component (compare HadGEM3-GC31, CNRM-CM6.1, CMCC-CM2 and EC-Earth3P) differ considerably. This suggests that either the different atmospheric or sea ice components or the choice of ocean parameters have a strong influence on the convection in the different NEMO models.
In contrast, the MLD differs little when only the atmosphere resolution is increased (compare ECMWF-IFS-MR with ECMWF-IFS-HR, HadGEM3-GC31-MM with HadGEM3-GC31-HM, and CCCM-CM2-HR4 with CCCM-CM2-VHR4). An exception is MPI-ESM1-2, where an increased atmospheric resolution reduces the MLD. This MLD reduction can be linked to wind forcing, which is too weak in MPI-ESM1-2-XR .
The place of deepest convection in the Labrador Sea varies somewhat across models, but we do not find any clear linkage to increasing ocean or atmosphere resolution.
To investigate the impact of natural variability on the mean March MLD in the historical period and to quantify the potential contribution of natural variations to the differences in MLD with changing resolution, we use an ensemble of historical simulations with the ECMWF-IFS model. The MLD in the low-resolution version ECMWF-IFS-LR is very shallow in all six ensemble members and there is no deep convection in the historical and control simulations (not shown). Thus, we concentrate in the following on the four members of ECMWF-IFS-HR, which all exhibit pronounced deep mixing in the Labrador Sea. These four ECMWF-IFS-HR members show considerable differences which reflect the impact of natural variability (Fig. 2a-d). The averaged March MLD  deviates in individual ensemble members by up to 200 m from the ensemble mean MLD. Although, this is a considerable amount, given the relatively long averaging period, the MLD differences between NEMO models with 1° and 0.25° resolution are much larger (compare Fig. 2 to differences between 1° and 0.25° simulations in Fig. 1). Even though four members are not sufficient to fully capture the complete range of total natural variability, these results suggest that natural variability cannot explain the differences in MLD, which occur between such simulations which have a change in ocean resolution. Finally, despite the variability across the ensemble members in

Deep mixed volume in the Labrador Sea
In the following, we concentrate on the DMV index for a detailed investigation of deep convection in the Labrador Sea. Figure 3 shows the DMV in the Labrador Sea in March in the historical model simulations. In agreement with Fig. 1, increasing the ocean resolution from around 1° to 0.25° leads to a generally higher DMV in all models using NEMO (ECMWF-IFS, HadGEM3-GC31, CNRM-CM6-1, EC-Earth3P, see 2nd and 3rd columns in Table 2), whereas the opposite is true for AWI-CM-1-0. Increasing the ocean resolution further to 1/12° in HadGEM3-GC31-HH does not further increase the DMV. The DMV varies strongly among models: ECMWF-IFS-LR does not show any deep convection events in the entire historical period, CNRM-CM6.1 and EC-Earth3P simulate only a few events with deep convection and AWI-CM-1-0-LR and both CMCC-CM2 versions simulate strong deep convection every winter. As for the MLD, most models overestimate the DMV compared to EN4 (Fig. 3h).    (Yashayaev andLoder 2016, 2017).
If we use a critical depth of z crit = 0 m instead of 1000 m in the Labrador Sea and thus consider the total mixed layer depth, the relative deviation of the DMV in the models from the observations is reduced as expected (not shown). However, AWI-CM-1-0-LR and CMCC-CM2 still overestimate the observed DMV (with z crit = 0) by a factor of around three and two, respectively. On the other hand, ECMWF-IFS-LR simulates only around 20% of the mixed volume compared to the observations. The comparison between z crit0 and z crit1000 reveals some non-linearites in the deep convection. While CNRM-CM6.1-h has a nine times higher DMV (z crit1000 ) compared to ARGO, it is only 16% higher for z crit0 , whereas the DMV (z crit1000 ) for MPI-ESM1-2-XR is 4.6 times higher compared to ARGO but 14% smaller for z crit0 .

Historical trends of DMV in the Labrador Sea
Ten of 17 simulations indicate a significantly negative trend of the DMV in the historical period (Fig. 3, Table 3). To investigate whether this trend is due to external forcing or due to model drift, we compare the DMV in the historical simulations with that from the 100-year 1950-control simulations ( Fig. 4 and Table 3). Most of the control simulations do not show any significant trend, and in 9 out of 17 historical simulations, the DMV trends in the historical simulations are significantly more negative compared to the first 65 years of the control simulations. This indicates external forcing as a major cause for the DMV reduction in the Labrador Sea in these historical simulations. Furthermore, the trends become more negative with higher ocean resolution. In all the models with NEMO as ocean component, the negative trends are stronger in the model versions with higher ocean resolution. In the AWI-CM1-model, both high-and lowresolution model versions show negative trends, however, the control runs show even larger negative trends and thus the negative trends in the historical simulations cannot be attributed to the changes in external forcing. A reduction of DMV in the historical period would be in line with some recent studies by Brodeau and Koenigk (2016) and Caesar et al. (2018). In contrast, the trend in the EN4 data for the same period, is significantly positive. However, the trend in EN4 data might be affected by the varying amount and quality of observational data, which are used to produce EN4. For example, from year 2000 onwards ARGO data were used in EN4 but were not available before, potentially making it harder to detect deep convection events in the earlier period. Figure 5 summarizes the results from Sect. 3.2 on the resolution dependency of the DMV in the Labrador Sea across the models. Each single model shows a clear dependence of the DMV on the oceanic resolution. The differences between models are large, and as discussed before, even models using the same NEMO ocean model exhibit a wide range of solutions. Models with coarse resolution (~ 100 km) in the ocean produce no or only shallow convection. Models with a resolution of 50 km and higher in the ocean, however, overestimate deep convection compared to ARGO and EN4. Increasing the atmosphere resolution has a minor effect on the DMV in the Labrador Sea, except for MPI-ESM1.2 and AWI-CM-1-0, where DMV is reduced with increased resolution.

Discussion of differences in the mean DMV between models
The strength of the deep convection in March is related to the vertical density distribution in the Labrador Sea. We analyse therefore the November density profiles in the Labrador Sea to discuss differences between models (Figs. 6 and 7a, c). We choose November since these profiles represent the stratification prior to the deep winter convection. In November, all models show a near surface layer of low density, mainly due to a combination of low salinity and relatively (compared to late winter) warm water near the surface (Fig. 7). In general, models with lower ocean resolution already show stronger stratification in the upper ocean in November than models with higher resolution (except for AWI-CM-1-0), which is consistent with their comparatively weaker DMV in late winter. This enhanced stratification is caused by colder and fresher surface water masses compared to models with high DMV (Figs. 7d, h). The two model simulations, which do not simulate any deep convection, ECMWF-IFS-LR and EC-Earth3P, show particularly strong density gradients in the upper ocean in November (Fig. 7c). Consequently, a large buoyancy flux would be necessary in winter to initiate deep convection in these two models. Models, which are warmer and saltier in the Labrador Sea in November are generally denser than models which are cooler and fresher. This relationship agrees with findings for CMIP5 models by Menary et al. (2015). The density profiles of the high-resolution ocean models agree well with ARGO and relatively well with EN4, which show a slightly lower density near the surface than ARGO due to lower salinity (Fig. 7h). The high-resolution models provide a realistic state of both T and S at the surface in the Labrador Sea both in November and March (Figs. 7a, b, d, e). We hypothesize that they transport more warm and salty water masses into the Labrador Sea compared to the lower resolution models (especially than the NEMO-models using the ORCA1-grid). This hypothesis is supported by recent results from Jackson et al. (2020). Furthermore, we find a clear relationship between the Sub-Polar Gyre (SPG) strength index (computed as the minimum (maximum absolute value) of the barotropic streamfunction between 65°-40° W at 53° N as in Danabasoglu et al. (2014)) and the DMV across the models. A stronger Sub-Polar Gyre in the high-resolution models is related to a stronger northward transport of warm and salty water masses into the Labrador Sea.
Although most of the high-resolution models simulate realistic S and T properties in the upper Labrador Sea, the DMV differs strongly between the models and is substantially higher than in ARGO and EN4. Vertical gradients across the upper 200 m are slightly larger in most highresolution models than in the observations (Fig. 7c), which would suggest reduced convection compared to ARGO. However, this reduction in convection might be compensated in these models by an excessively shallow surface layer with low density (Fig. 6). This shallower surface layer requires less heat loss to be eroded in these models and could contribute to the overestimation of the deep convection in late winter. Figure 7 k shows, as expected, a strong dependence of the models' DMV on winter surface heat fluxes (SHF) in the Labrador Sea. The correlation between March DMV and January-March SHF across models reaches 0.76. Figure 7 k shows in addition that all high-resolution models (except for MPI-ESM-1-2-XR) overestimate (meaning too strong fluxes from the ocean to the atmosphere) the SHF compared to WHOI-OAFlux. This overestimation likely contributes to stronger buoyancy loss in the winter and too strong convection in March in the high-resolution models. Models with very large DMV (e.g. both CMCC-CM2 model versions) have rather weak stratification in November as well as strong surface heat fluxes in winter.
While our analysis above explains parts of the differences in the DMV in the different models, it does not explain the reasons for the different biases in the models. Conditions in the Labrador Sea are affected by both large scale ocean (e.g. advection of water masses into the Labrador Sea, position of the North Atlantic Current and its subbranches, extension of the sub-polar gyre) and atmosphere processes (e.g. exact position, strength and variability of the North Atlantic Oscillation pattern) and descriptions of local processes (e.g. different parameterizations or parameters for mixing and lateral and vertical diffusion processes or sea ice processes). Since the models differ in many aspects in their atmosphere, ocean and sea ice components and partly in their initialization method, it is not possible to identify any specific reason or parameterization responsible for the differences in S, T and SHF between models as part of this multi-model study. However, representation of mixing processes in ocean models is imperfect and contributes both to the biases in the DMV and to the DMV-differences between models and must be given particular attention in future model developments.

Variability of DMV in the Labrador Sea and driving processes
The interannual variability of the DMV is large in all models (Fig. 3). Some of the simulations (EC-Earth3P, HadGEM3-GC31-LL, MPI-ESM1-2, CNRM-CM6.1 and ECMWF-IFS-MR) indicate substantial variability at decadal or longer periods, in which phases with and without convection alternate. Such intermittent deep convection was also suggested from observations (Lazier et al. 2002;Yashayaev and Loder 2016) and is visible in the EN4-data as well. However, in most model simulations, deep convection occurs in almost every winter or not at all. Deep convection depends strongly on the buoyancy of the ocean surface layer in the convection regions, which in turn depends (amongst other things, e.g. advection of heat) on the heat loss to the atmosphere and the influx of fresh water into the convection regions. Brodeau and Koenigk (2016) showed that the turbulent surface heat flux (SHF) is the main driver for interannual variability in the DMV and Fig. 7 k shows that the mean SHF in the Labrador Sea is highly correlated with the mean DMV across models. Thus, we will focus first on the effect of varying SHF on the DMV in the Labrador Sea. Figure 8 shows the winter (January, February, March) SHF in each of the model simulations. Observations from WHOI-OAFlux show that the largest SHF of more than 200 W/m 2 is found near the ice edge in the Labrador Sea, with regions of heat loss extending to the southern part of the SPG, south of Iceland and along the southeast coast of Greenland, as well as in the northern Norwegian-Greenland Seas and Barents Sea. The large-scale features of this pattern are reproduced by most of the models. ECMWF-IFS-LR and to a lesser degree EC-Earth3P, however, which both simulate too weak convection, strongly underestimate the SHF in the Labrador Sea. Increased ocean resolution generally improves the representation of the observed SHF pattern. In particular, the extension of high SHF from the Labrador Sea into the southwestern branch of the SPG and the high SHF in the northern Greenland and Norwegian Seas are better simulated. However, a number of models (both CMCC-CM2 versions, HadGEM3-GC31-HH and CNRM-CM6.1) overestimate the SHF in the SPG. In addition, the SHF west and northwest of Scotland is too high in most of the models.

The impact of surface heat fluxes on the variability of deep convection
In the Labrador Sea, all high-resolution models with NEMO as the ocean component simulate increased SHF (averaged over the same box as used for the calculation of the DMV) compared to their lower-resolution counterparts (Table 2). In contrast, MPI-ESM1-2 shows reduced SHF with increased atmospheric resolution in line with the  convection. In all models, the interannual variability of winter SHF is significantly positively correlated with the DMV in March. Thus, large ocean heat losses in the winter are linked to strong DMVs in the following March, indicating that large upward surface heat fluxes lead the DMV. The correlation coefficient of SHF and DMV varies from 0.48 in CNRM-CM6.1 to slightly above 0.7 in ECMWF-IFS-MR, EC-Earth3P and CMCC-CM2-HR4. The relation between SHF and DMV is neither resolution nor model dependent.
The winter SHF in the Labrador Sea itself is governed by the atmospheric circulation. In all model simulations north-to-northwesterly winds, which advect cold air towards the Labrador Sea, lead to strong surface heat fluxes, which can erode the stratification of the ocean and increase convection (Ortega et al. 2011). These north-tonorthwesterly winds are linked to the positive phase of the North Atlantic Oscillation (NAO, defined as the leading EOF of geopotential height on the 500 hPa pressure surface over the European/Atlantic sector (80° W-40° E, 20-90° N)). The correlation between NAO and SHF is high in all models except for ECMWF-IFS-LR and EC-Earth3P (Fig. 9) and resemble well the observed correlation pattern between NAO and SHF (Fig. 9 a). In ECMWF-IFS-LR and (to a smaller extent) EC-Earth3P, sea ice extends far to the southeast and covers large part of the Labrador Sea (compare Docquier et al. 2019), and Fig. 7 Relation between DMV in the Labrador Sea in March (in 10 15 m 3 ) and surface density (a, b), surface temperature (d, e), surface salinity (h, i) in the Labrador Sea in November and March. c, f, j DMV versus differences between surface and 200 m depth for density, temperature and salinity in November. g DMV versus annual mean Sub-Polar Gyre index (SPG-index). AWI-CM-1.0 and CNRM-CM6.1 models are missing due to missing data. k DMV versus surface heat flux (positive means flux from the ocean into the atmosphere). AWI-CM-1.0 models are missing due to missing data. Observational data for the SHF are from WHOI-OAFlux Shown are means for the historical simulation 1950-2014. The red line is the linear regression line and the red numbers the correlation coefficient between DMV and the respective variable on the x axis across models thus, the correlations between NAO and SHF are smaller in the Labrador Sea. The NAO-index itself, is significantly positively correlated with the DMV in the Labrador Sea in all simulations with values between 0.31 (EC-Earth3P) and 0.67 (HadGEM-GC31-HM and CMCC-CM2-HR4) ( Table 2). The models with an ocean resolution around 1° (ECMWF-IFS-LR, EC-Earth3P, HadGEM-GC31-LL, CNRM-CM6-1) show a weaker correlation between NAO and DMV in the Labrador Sea than the models with higher resolution. Interestingly, this is not true for the correlation between SHF and DMV.
The rather high correlation between NAO and both SHF and DMV reveals the importance of the atmospheric circulation for deep convection in the models, however, increased DMV due to strong SHF might feed back on the SHF by continuously bringing warm water to the surface and causing more heat loss.
The spatial imprint of the NAO-index on the 500hpa geopotential height is shown in Fig. 10. All models reproduce the NAO-pattern of the ERA5-reanalysis data (Hersbach et al. 2020) well. However, the position of the negative pole over Iceland-Greenland and the extension of the positive pole towards Eurasia vary slightly among models. We do not see a systematical difference between the NAO-pattern in the low resolution and the high-resolution models, which could explain why the correlation between NAO and SHF differ between high-and lowresolution models.

The impact of Arctic freshwater and sea ice exports on the variability of deep convection
A number of studies discussed the effect of Arctic freshwater export as a potential source of variability of the deep water convection in the Labrador Sea (Holland et al. 2001;  Jungclaus et al. 2005;Koenigk et al. 2006). Here, we analyze the correlations between freshwater transports across different sections (Fram Strait, Denmark Strait, northern Baffin Bay) and deep convection in the Labrador Sea in the historical simulations of the models. Table 4 shows the freshwater exports out of the Arctic Ocean into the North Atlantic through Fram Strait, Baffin Bay and the Denmark Strait. Although differences between models are large, the exports through Fram Strait are generally larger than through Baffin Bay. The total freshwater exports through Fram Strait (liquid + solid export) vary between around 80,000 m 3 /s in the two CNRM-CM6.1 models and 160,000 m 3 /s in ECMWF-IFS-LR and HadGEM3-GC31-MM. The ratio between liquid and solid freshwater export through Fram Strait differs strongly in the simulations. While in HadGEM3-GC31-LL and all ECMWF-IFS and EC-Earth3P simulations most of the freshwater leaves the Arctic in the form of sea ice, the liquid and solid fractions are of similar size in the other models. In CNRM-CM6.1-h, the liquid fraction is even larger than the solid fraction. The amount of total freshwater passing through Denmark Strait is smaller than the export through the Fram Strait in all models except for ECMWF-IFS-LR and MPI-ESM1.2-XR. The liquid part is dominant since large parts of the ice melt in the East Greenland Current on its way from Fram Strait to Denmark Strait leading to large differences in the solid components between these locations.
The low-resolution versions of ECMWF-IFS, HadGEM3-GC31, CNRM-CM6.1 show a larger fraction of solid freshwater exports through Fram Strait and larger liquid transports through Baffin Bay (EC-Earth3P as well for Baffin Bay) compared to their higher resolution counterparts. The sum of freshwater exports through Fram Strait and Baffin Bay differs more between the models than between different versions of single models. Despite the large differences in mean Arctic freshwater exports into the North Atlantic, there is generally no clear linkage to the mean DMV in the Labrador Sea. However, for ECMWF-IFS-LR, we speculate that the very large freshwater fluxes, particularly in the form of sea ice, through Denmark Strait contribute to the low surface density in the Labrador Sea (compare Figs. 6,7) and consequently to the suppression of deep convection in the Labrador Sea. To further investigate whether the variability of freshwater exports affects deep convection in the Labrador Sea, we correlate the solid and liquid transports across all sections with the DMV. In all model simulations, the annual mean southward transport of both liquid and solid freshwater across Fram Strait and the liquid transport across Denmark Strait are weakly negatively correlated with the deep convection in the Labrador Sea in March (ranging between −0.1 and −0.4). The highest correlation is reached when the freshwater transport through Fram Strait (and Denmark Strait) leads the convection by one to two years (and zero to one year). Increased southward transport of sea ice and liquid freshwater through Fram Strait along Greenland's east coast and through Denmark Strait leads to more freshwater input into the Labrador Sea, which tends to freshen the surface ocean, increase stratification and reduce the convection. Figure 11 shows for the two model simulations with the highest correlation between freshwater transport through Denmark Strait and DMV in the Labrador Sea (HadGEM-GC31-LL, EC-Earth3P-HR) that increased freshwater transport reduce the MLD in the Labrador Sea. For most other model simulations, the effect of freshwater transports on the MLD is rather small.
In some models, the southward transport of liquid freshwater through Baffin Bay is positively correlated with the deep convection in the Labrador Sea (up to r = 0.35 in HadGEM3-GC31-LL). This may seem counterintuitive, but while northerly winds in the Baffin Bay cause strong SHF in the Labrador Sea and dominate the convective conditions, they simultaneously lead to increased freshwater transports through Baffin Bay.
We do not find any resolution dependency of the correlation between freshwater exports and convection in the Labrador Sea. This result is in contrast to a recent study from Fuentes Franco and Koenigk (2019) where they analyzed a set of HadGEM3-GC2 simulations at different resolutions and found larger correlations with increased resolution. non-eddying to eddy-present and eddy-rich". Roberts et al. (2020) also analysed the relation between the time-mean values of the DMV and the AMOC in the historical runs. They found that there is a strong relationship between DMV and the AMOC strength across models; models with more deep convection in the Labrador Sea have a stronger AMOC. As shown in our Sect. 3.2, only few models simulate a DMV that is consistent with observed estimates.

The linkage of the DMV to the AMOC
The effect of high resolution on the AMOC in the High-ResMIP model simulations has been studied in more detail in Roberts et al. (2020). They found that "the AMOC tends to become stronger as model resolution is enhanced, particularly when the ocean resolution is increased from However, these models underestimate the AMOC (except for CNRM-CM6-1) compared to the RAPID-MOCHA array observations (Cunningham et al. 2007;Smeed et al. 2004) whereas some of the models (HadGEM3-GC31-MM and -HM, MPI-ESM1.2-h, AWI-CM-1-0-LR) markedly overestimate the DMV in the Labrador Sea but simulate a realistic AMOC. Thus, the linkage between the mean values of the AMOC and the DMV in the models is not consistent with the observations. To investigate the impact of variability in the deep convection on the variability of the AMOC, we performed cross-correlation analyses between the DMV in Labrador Sea and the AMOC (at 26°N) for lags between −/ + 10 years. In agreement with results by Brodeau and Koenigk (2016), annual values are only rather weakly correlated with each other. We thus focus here on correlations of linearly detrended and 10-year low pass filtered values of DMV and AMOC (Fig. 12) in the 100year 1950-control simulations. Positive lags mean that the AMOC leads the DMV, negative lags mean that the DMV leads AMOC. Maximum values of the correlations vary between around 0.3 (AWI-CM-1-0-h, ECMWF-IFS-MR, CMCC-CM2-HR, EC-Earth3P-HR) and 0.8 (HadGEM3-GC31-LL, CNRM-CM6-1-h). Most model simulations with significant correlations reach their maximum correlation when the DMV leads the AMOC by 0-5 years. Neither the amplitude of the correlations or the lag of the maximum correlation show any robust resolution dependency. For the HadGEM-GC3.1 model family for example, HadGEM-GC3.1-LL shows a high correlation Fig. 12 Correlation between the DMV using a critical depth of 1000 m in the Labrador Sea in March and the AMOC index for the 100-year control simulation. Both timeseries were detrended and filtered with a 10-year low-pass filter. Area enclosed by dotted lines represents the 95% confidence calculated as 2/sqrt(N), where N is the number of independent data based on the time that takes autocorrelation to fall below 1/e. Positive lags mean AMOC leads DMV, negative lags mean DMV leads AMOC. The low-resolution version of EC-Earth3P does not produce any deep convection events in the control simulation of around 0.8 when the DMV leads by 5 years, HadGEM-GC3.1-MM a correlation of 0.5 at lag 0, HadGEM-GC3.1-HM 0.8 when DMV leads by 1 year and HadGEM-GC3.1-HH shows a correlation of 0.7 when the DMV leads the AMOC by 2-3 years.

Conclusions
We have here analyzed historical and 1950-control simulations from seven global climate models, which followed the HighResMIP protocol, and investigated the effect of the increasing resolutions in their ocean and atmosphere components on deep convection in the Labrador Sea.
The ocean resolution clearly affects the deep ocean mixing in the Labrador Sea. Convection activity is enhanced with increasing ocean resolution in four out of five models in this study. However, all these models use NEMO (although in different configurations) as their ocean component and all of them increase the horizontal ocean resolution from 1° to 1/4°. It remains therefore unclear whether global models with other ocean components respond differently to an increased resolution. In the model, in which convection is reduced at higher ocean resolution (AWI-CM-1-0), this reduction results very likely from the simultaneously increased resolution in the atmosphere. The further increase of horizontal resolution in HadGEM-GC31-HH from 1/4° to 1/12° leads to slightly reduced convection. Thus, the conclusions on the effect of increasing ocean resolution on the convection in the Labrador Sea are only valid for models including NEMO as ocean model and for increasing resolution from noneddy permitting to eddy permitting. Future studies need to explore the effect of increasing the resolution from eddypermitting to eddy-rich in more detail.
Increasing the ocean resolution from 1° to 1/4° in the models with NEMO as the ocean component has a larger impact on the convection than increasing the atmosphere resolution in these models. In contrast, MPI-ESM1-2, in which only the atmosphere resolution has been increased, and AWI-CM-1-0 (increased resolution in both atmosphere and ocean) show substantially reduced convection in the Labrador Sea at high resolution. Both models (AWI-CM-1-0, MPI-ESM1-2) use the same atmospheric component (ECHAM6.3) and the reduction of DMV with increased atmospheric resolution can probably be linked to weaker winds in the high-resolution version of ECHAM6.3 .
Nine out of the 17 different model versions show a significantly negative trend in the Labrador Sea convection in the historical simulation compared to their respective control simulation, thus supporting the view that the trend is externally forced. In five out of the seven models, the negative trends are stronger in the higher resolution versions of the models. This agrees well with recent finding by Roberts et al. (2020) who showed that the AMOC tends to decline more rapidly in higher-resolution models, since the AMOC is expected to weaken in response to a reduction in Labrador Sea convection in the models.
The models differ strongly in the intensity of the deep convection in the Labrador Sea. While two of the low-resolution models do not simulate any deep convection in the Labrador Sea, other lower resolution and most of the highresolution models overestimate the deep convection compared to ARGO and EN4 data. We need to keep in mind that uncertainty regarding deep convection exist in both ARGO and EN4. However, our results are qualitatively robust across different time periods and both data sets.
The differences in convection between models can be linked to different states of the ocean vertical stratification in November and to differences in the winter atmospheric forcing (with winter surface heat losses ranging substantially across the models). Generally, we find that larger upper ocean density (mainly due to higher salinity) in November and stronger winter surface oceanic heat loss in the Labrador Sea leads to stronger convection in March. Increasing the ocean resolution improves the vertical stratification of the upper Labrador Sea in late autumn with rather realistic upper ocean temperatures and salinities compared to ARGO and EN4. The models with warmer and more saline upper ocean surface masses in the Labrador Sea show a stronger subpolar gyre circulation, which could transport these warmer and more saline water masses from lower latitudes into the Labrador Sea. However, despite having reasonably realistic water masses in the Labrador Sea in late autumn, deep convection in March is overestimated in the high-resolution models. As potential causes we identified (1) a too shallow surface layer with low density in late autumn, which can be more easily eroded in the winter, and (2) too high surface heat losses during winter. However, the too shallow upper layer is partly compensated by too strong density gradients in the upper ocean, and the overestimated SHF can partly be a response to too strong convection. Thus, it is likely that other uncertainties in the model descriptions such as parameterizations of sub-gridscale processes in the ocean are contributing to the biases in the convection.
The variability of the DMV is mainly governed by varying surface heat fluxes and their driving large scale atmospheric modes. The NAO is governing the surface heat fluxes in the Labrador Sea and is significantly correlated to the DMV in the models. The density stratification of the Labrador Sea in late autumn plays a role for the variability as well since a strong stratification requires stronger atmospheric forcing to erode the stratification. The upper ocean density is also affected by freshwater transport, however, the variability of Arctic freshwater exports plays in most of the models a smaller role for the variability of the DMV compared to the surface heat flux variations.
The DMV in the Labrador Sea is highly positively correlated (r = 0.6-0.8) with the AMOC at 26°N in around half of the model simulations at the decadal scale. In these simulations, the DMV leads the AMOC by a few years. In the other simulations, the correlations are positive as well but lower (0.3-0.4) and time lags of the highest correlations are not robust across these simulations. The correlations between the DMV and the AMOC are not dependent on the resolution.
The large bias in the simulation of deep convection in the Labrador Sea in the models is a concern since a realistic simulation of deep convection is important for the large-scale ocean circulation, in particular the AMOC and the North Atlantic subpolar gyre, and for the northward heat transport in the ocean and its related impacts on atmosphere and sea ice. Thus, a realistic representation of convection in the Labrador Sea is an important prerequisite for skillful decadal predictions (Menary and Hermanson 2018). The lack of such a realistic representation also raises serious questions about the future behaviour of the AMOC in climate models and its consequences for local and global climate. Thus, future studies need to analyze potential deficiencies in the ocean parameterizations in more detail to improve the representation of the deep ocean convection in the models.