Characterizing, modelling and understanding the climate variability of the deep water formation in the North-Western Mediterranean Sea

Observing, modelling and understanding the climate-scale variability of the deep water formation (DWF) in the North-Western Mediterranean Sea remains today very challenging. In this study, we first characterize the interannual variability of this phenomenon by a thorough reanalysis of observations in order to establish reference time series. These quantitative indicators include 31 observed years for the yearly maximum mixed layer depth over the period 1980–2013 and a detailed multi-indicator description of the period 2007–2013. Then a 1980–2013 hindcast simulation is performed with a fully-coupled regional climate system model including the high-resolution representation of the regional atmosphere, ocean, land-surface and rivers. The simulation reproduces quantitatively well the mean behaviour and the large interannual variability of the DWF phenomenon. The model shows convection deeper than 1000 m in 2/3 of the modelled winters, a mean DWF rate equal to 0.35 Sv with maximum values of 1.7 (resp. 1.6) Sv in 2013 (resp. 2005). Using the model results, the winter-integrated buoyancy loss over the Gulf of Lions is identified as the primary driving factor of the DWF interannual variability and explains, alone, around 50 % of its variance. It is itself explained by the occurrence of few stormy days during winter. At daily scale, the Atlantic ridge weather regime is identified as favourable to strong buoyancy losses and therefore DWF, whereas the positive phase of the North Atlantic oscillation is unfavourable. The driving role of the vertical stratification in autumn, a measure of the water column inhibition to mixing, has also been analyzed. Combining both driving factors allows to explain more than 70 % of the interannual variance of the phenomenon and in particular the occurrence of the five strongest convective years of the model (1981, 1999, 2005, 2009, 2013). The model simulates qualitatively well the trends in the deep waters (warming, saltening, increase in the dense water volume, increase in the bottom water density) despite an underestimation of the salinity and density trends. These deep trends come from a heat and salt accumulation during the 1980s and the 1990s in the surface and intermediate layers of the Gulf of Lions before being transferred stepwise towards the deep layers when very convective years occur in 1999 and later. The salinity increase in the near Atlantic Ocean surface layers seems to be the external forcing that finally leads to these deep trends. In the future, our results may allow to better understand the behaviour of the DWF phenomenon in Mediterranean Sea simulations in hindcast, forecast, reanalysis or future climate change scenario modes. The robustness of the obtained results must be however confirmed in multi-model studies.


Introduction
Open-sea deep convection and the associated deep water formation (DWF) are among the most important ocean phenomena driving the ocean thermohaline circulation (Marshall and Schott 1999). They occur in few places of the world including the North-Western Mediterranean Sea in which they have been observed since a long time (MEDOC Group 1970;Leaman and Schott 1991;Schott et al. 1996) and still recently (Durrieu de Madron et al. 2013). In this specific area, open-sea deep convection leads to the formation of the Western Mediterranean Deep Water (WMDW) with θ-S-ρ characteristics, historically close to 12. 75-12.92 • C, 38.41-38.46 and 29.09-29.10 kg/m 3 (Mertens and Schott 1998). In the North-Western Mediterranean Sea, the phenomenon, its various phases and the associated processes have been studied and described since the earlier dedicated field campaigns (MEDOC Group 1970) and synthetized in Marshall and Schott (1999). Generally, three main phases can be identified: the preconditioning phase, the violent mixing phase and the restratification-spreading phase. The preconditioning is a long-term phenomenon. It is mainly due to high salinity at surface and intermediate depths, to a local circulation isolating the water masses during winter and to a regional isopycnal doming that all favour ocean deep convection (Madec et al. 1996;Marshall and Schott 1999). The high salinities are due to a strong net water loss at basin scale related to the Mediterranean dry climate and to the advection of the Levantine Intermediate Water (LIW), a salty and warm intermediate water coming from the Eastern Mediterranean Sea. The doming is itself related to a regional cyclonic circulation partly due to the local dominant wind forcing and to previous DWF events maintaining an adequate horizontal density gradient. The violent mixing phase occurs in winter (maximum mixed layer depth is commonly observed in February or March) when the surface waters are cooled enough to become unstable by northerly or northwesterly strong, dry and cold winds called Mistral and Tramontane (Leaman and Schott 1991;Durrieu de Madron et al. 2013). Under these atmospheric forcings, the water column can be mixed within the so-called mixed patch surrounded by a rim current. The mixing can eventually reach the bottom at more than 2000 m (Mertens and Schott 1998;Durrieu de Madron et al. 2013;Stabholz et al. 2013). The restratification and spreading phase starts early in spring when the buoyancy flux becomes positive and when the lateral advection of lighter waters becomes strong enough to destroy the mixed patch (Madec et al. 1991b;). Many meso-and submeso-scale processes are involved in the restratification and spreading phase such as baroclinic instabilities of the rim current and submesoscale coherent vortices (Madec et al. 1991b;Marshall and Schott 1999;Testor andGascard 2003, 2006;Bosse et al. 2015).
In addition to the large range of processes involved, the DWF phenomenon also shows a strong climate variability. θ-S-ρ characteristics of the WMDW have always been variable in time and space (Mertens and Schott 1998) and recent abnormal DWF events have been observed (Schroeder et al. 2008). Interannual variability has been documented by both in-situ observations (Mertens and Schott 1998;Béthoux et al. 2002;Houpert et al. in revision) and modelling studies (Castellari et al. 2000;Herrmann et al. 2010;L'Hévéder et al. 2013). In addition, long-term warming and saltening trends have been established over the last decades (Béthoux et al. 1990;Rohling and Bryden 1992;Krahmann and Schott 1998;Send et al. 1999;Béthoux et al. 2002;Rixen et al. 2005;Zunino et al. 2009). To our knowledge, the robust value of this trend and its attribution to human activities either global climate change or river daming is still under debate (Béthoux et al. 1990;Rohling and Bryden 1992;Krahmann and Schott 1998). This trend could also be related to natural variability, for example, to the decadal variability of the incoming Atlantic Water characteristics or to the mixing with saltier and warmer Mediterranean Outflow Waters at the Strait of Gibraltar (Millot 2007). Moreover, it is expected that the DWF phenomenon may be strongly affected by climate change during the twenty-first century (Somot et al. 2006;Adloff et al. 2015) with impacts on the regional biogeochemistry (Herrmann et al. 2014).
Due to the complexity of the phenomenon at various temporal and spatial scales, observing, modelling and understanding the long-term temporal variability of the North-Western Mediterranean deep water formation still remain today very challenging tasks. Long-term and continuous monitoring of the phenomenon started only recently thanks to the HYDROCHANGES programme with deep moorings (Schroeder et al. 2013) and to the integrated observing system MOOSE (Mediterranean Ocean Observing System for the Environment, www.moose-network.fr) since 2010 with annual and monthly CTD cruises and continuous recording by surface buoys and deep moorings (Testor et al. 2012;Durrieu de Madron et al. 2013). Therefore, our current knowledge of the long-term variability of the phenomenon before 2007 mainly comes from yearly case studies and sparse field campaigns (Leaman and Schott 1991;Schott et al. 1996;Schroeder et al. 2008;Durrieu de Madron et al. 2013) or from satellite-derived reconstructions . First attempts to model the North-Western Mediterranean DWF were carried out using academic configurations (Madec et al. 1991b) or 1D-models (Mertens and Schott 1998). First 3D multi-annual modelling studies were performed around year 2000 (Myers et al. 1998;Myers and Haines 2000;Castellari et al. 2000) often using ad-hoc flux corrections or sea surface salinity (SSS) adjustments to empirically enhance the surface water density and allow deep convection. Improved air-sea fluxes with higher temporal and spatial resolutions and improved ocean initial conditions to initialize water masses were finally required to obtain a realistic DWF modelling in the North-Western Mediterranean without ad-hoc adjustements, for case studies of a given winter (Demirov and Pinardi 2007;Herrmann et al. 2010;Beuvier et al. 2012;Estournel et al. 2016) or for multi-annual simulations (Somot et al. 2006;Sannino et al. 2009;Herrmann et al. 2010;Béranger et al. 2010;L'Hévéder et al. 2013). These realistic modelling studies confirm the dominant role of the winter air-sea fluxes but also the key role of the water mass preconditioning to explain the interannual variability of the phenomenon. Besides, Josey et al. (2011) and Papadopoulos et al. (2012) determine the favourable largescale weather patterns leading to strong heat losses over the Mediterranean Sea. In particular, they identify that the East Atlantic pattern, characterized by a ridge over the Atlantic and a low in Eastern Europe, leads to above-normal heat losses over the North-Western Mediterranean Sea. The modelling studies dealing with a particular case study also point out the key role of the daily temporal variability of the atmosphere forcings Herrmann et al. 2010) and of the meso-scale activity (Demirov and Pinardi 2007;Beuvier et al. 2012).
Despite noticeable improvements in the last decades in the way to observe, model and understand the North-Western Mediterranean DWF, the long-term characterization of this phenomenon and its accurate modelling at climate scale still suffer from various deficiencies. Those deficiencies finally limit our understanding of its interannual variability and trends, as well as our capacity to foresee its possible evolution in a changing climate. For example, to our knowledge, since Mertens and Schott (1998) and Béthoux et al. (2002), no long-term reference dataset has been established to characterize the DWF interannual variability in terms of mixed layer depth, convective surface, DWF rate or water mass characteristics. This prevents evaluating accurately long-term model simulations. In addition, most of the model studies published to date (except for L 'Hévéder et al. 2013) use ocean models in a forced mode despite potential high-frequency coupling effects during the strong and short heat loss events leading to the DWF. All the published model studies also stop their chronology either at the beginning of the 2000s (Sannino et al. 2009;Béranger et al. 2010;L'Hévéder et al. 2013) or just after winter 2005 ) whereas convective years have been extensively observed afterwards (Durrieu de Madron et al. 2013;Houpert et al. in revision). Consequently, none of the published studies try to assess the recent observed trends in the deep water mass characteristics.
The main objectives of the current study are therefore to: • Characterize the interannual variability and the trends of the DWF in the North-Western Mediterranean Sea since the 1980s. For this purpose, a thorough reanalysis of past in-situ and satellite observations is carried out in order to establish reference time series of quantitative indicators of the DWF phenomenon; • Simulate the DWF phenomenon, its interannual variability and the observed recent trends. For this purpose, a multi-decadal hindcast simulation is performed with a fully-coupled regional climate system model including the high-resolution representation of the regional atmosphere, ocean, land-surface and rivers and forced at its lateral boundaries by atmosphere and ocean global reanalyses. The simulation is designed to be, as much as possible, stable and homogeneous in time and to realistically represent the period 1980-2013, and is evaluated using the observation-based indicators; • Improve the understanding of the interannual variability of the DWF phenomenon and of the recent trends in the deep water mass characteristics. After evaluation, the hindcast simulation is used to identify the main driving factors of the phenomenon climate variability and their relative contributions.
The manuscript starts with the description of the material and methods (Sect. 2). The results are presented in Sect. 3, the limits of the current study are discussed in Sect. 4 before the conclusions in Sect. 5.

Material and methods
This section presents the geography of the studied area, the model used and the simulation performed as well as the observation-based indicators. We also introduce the notion of daily weather regimes that will be used later as possible explaining factors of the air-sea fluxes over the Gulf of Lions.

Geographical area and particular zones
In the analysis, we will mainly focus on the North-Western Mediterranean Sea defined as the Mediterranean zone north of 39.9 • N and west of 9.5 • E and called NWMED hereinafter. In the model used, the surface of this zone is equal to 22.8 × 10 10 m 2 and its volume to 4.1 × 10 14 m 3 . Within this zone, we define the Gulf of Lions area (acronym GoL in the following) as a square centered over the main open-sea deep convection area. The limit of the chosen square are 40.6 • N-42.5 • N; 4 • E-6.5 • E. This choice is a compromise between the will of simplicity (we would like our study to be easily reproducible in other models) and the will to define an area excluding the shelf convection and including the usual zone of open-sea deep convection. In the model configuration (see below), the GoL corresponds to 441 grid meshes that is to say a surface of 4.8 × 10 10 m 2 , a volume of 1.2 × 10 14 m 3 and a maximum depth of 2773 m. Our strategy here is different from the one used in L' Hévéder et al. (2013) who define a very specific zone changing every year and depending on the sea level height gradient. We verified that the results are not very sensitive to the choice of the GoL box. Inside the GoL box, we also define the LION point (5 • E-42 • N) which approximately corresponds to the LION surface buoy and deep in-situ moorings (see below) for which multi-variable, nearly continuous and long-term observations are available. The geographical area studied and the three main zones (NWMED, GoL, LION) are shown in Fig. 1.

Motivations for the modelling strategy
As shown by previous works, the long-term modelling of the DWF phenomenon in the North-Western Mediterranean Sea requires at the same time high-resolution representation of the regional atmosphere Béranger et al. 2010), high-resolution representation of the Mediterranean Sea itself (Demirov and Pinardi 2007;) and simulations with stable and temporally homogeneous water masses (Somot et al. 2006) in order to avoid spurious trends or temporal breaks due to inhomogeneous forcings for example. In addition, we consider that the air-sea interface must be specified as freely as possible allowing high-frequency feedbacks and with no surface relaxation nor flux correction. Indeed, daily heat losses above 1000 W/m 2 (Mertens and Schott 1998; and episodic local warm sea surface temperature (SST) anomalies due to LIW entrainement (Mertens and Schott 1998) could be the potential sources of high-frequency two-way coupling that should be taken into account. Besides, the water budget including the river representation and the near-Atlantic surface water characteristics must be carefully represented as both parameters drive the long-term value of the Mediterranean salinity and therefore strongly influence the preconditioning phase (Sannino et al. 2009;Dubois et al. 2012). Moreover, it has also been demonstrated that the daily and interannual variability of the air-sea fluxes are key drivers of the DWF phenomenon L'Hévéder et al. 2013) and model simulations must follow the true day-to-day atmosphere chronology to hope to represent the phenomenon variability from daily to interannual time scales. We tried to take into account all the aforementioned constraints about the DWF modelling to design the modelling tool used in the study.

Description of the Regional Climate System Model: CNRM-RCSM4
We developed at Météo-France/CNRM a dedicated fully coupled Regional Climate System Model (RCSM) driven by global ocean and atmosphere reanalyses at its lateral boundary conditions and using the spectral nudging technique to guaranty the day-to-day chronology of the atmosphere model (Colin et al. 2010;Herrmann et al. 2011). This model, called CNRM-RCSM4, is the fourth generation of coupled Regional Climate Models (RCMs) developed at CNRM dedicated to the study of the Mediterranean area (for the previous generations, see Somot et al. 2008;Herrmann et al. 2011;Dubois et al. 2012). It covers the official Med-CORDEX domain (www.medcordex.eu, Ruti et al. 2015) that is to say the Mediterranean climate zone, the Mediterranean Sea, the Black Sea, their respective river catchment basins and part of the near-Atlantic Ocean. CNRM-RCSM4 includes the regional representation of the atmosphere at a 50-km resolution (ALADIN-Climate model, Colin et al. 2010;Herrmann et al. 2011), the land-surface at 50 km (ISBA model, Noilhan and Mahfouf 1996), the rivers at 50 km (TRIP model, Decharme et al. 2010) and the ocean at 10 km (NEMOMED8 model, Beuvier et al. 2010;Herrmann et al. 2010) with a daily coupling based on the OASIS3 coupler (Valcke 2013). Note that nowadays, Atmosphere-Ocean Regional Climate Models (AORCM) or Regional Climate System Models (RCSM) are becoming common modelling tools to study regional climates and regional seas. This is especially true for the Mediterranean area Artale et al. 2010;Krzic et al. 2011;Herrmann et al. 2011;Dubois et al. 2012;Drobinski et al. 2012;L'Hévéder et al. 2013;Sanna et al. 2013;Akhtar et al. 2014) thanks to coordinate activities such as the European CIRCE project Dubois et al. 2012) and the Med-CORDEX initiative (Ruti et al. 2015). The detailed description of CNRM-RCSM4 and of the various components can be found in Sevault et al. (2014). We recall here only the model characteristics relevant for the current study. For the atmosphere and land-surface representation, we use the version 5.2 of ALADIN-Climate including the ISBA surface scheme that was used to perform the CNRM CORDEX simulations over Africa, North-America, Europe and the Mediterranean (Herrmann et al. 2011;Lucas-Picher et al. 2013;Tramblay et al. 2013;Jacob et al. 2014). It has a similar physical package to the global climate model ARPEGE-Climate (Voldoire et al. 2013) used in the CMIP5 exercise. In the current study, we use a Lambert conformal projection centered at 43 • N and 14 • E with 101 longitude grid-points and 63 latitude grid-points in the physical central zone and 31 vertical levels. The time step used is 1800s. ALADIN includes the Fouquart and Morcrette radiation scheme (FMR15, Morcrette 1989) that computes the surface long-wave and short-wave radiation terms used by the ocean model. This radiation scheme incorporates the effects of clouds, water vapor and greenhouse gases, the direct effect of the aerosols as well as the first indirect effect of the sulfate aerosols. The computation of the turbulent air-sea fluxes (latent heat flux, sensible heat flux and momentum flux) is based on Louis (1979).
The ocean component of CNRM-RCSM4 is the model NEMOMED8 ), a regional eddypermitting configuration of the version 2.3 of the NEMO ocean model (Madec 2008). NEMOMED8 covers the Mediterranean Sea (without the Black Sea) plus a buffer zone including a part of the near Atlantic Ocean with an horizontal resolution of 1/8 • × 1/8 • cos(latitude), namely between 9 and 12 km over the Mediterranean Sea and around 10 km in the North-Western Mediterranean Sea. It has 43 vertical levels, with layer thickness increasing from 6 to 200 m. The partial steps definition of the bottom layer is used to match as much as possible the local true bathymetry (Madec 2008, see Fig. 1). The ocean physical parameterizations used in this study are the same as in Beuvier et al. (2010). In particular, a 1.5 turbulent closure scheme is used for the vertical eddy diffusivity (Blanke and Delecluse 1993) and the ocean convection is parameterized with an enhanced vertical diffusivity coefficient set to 50 m 2 /s in case of unstable stratification. This simple parameterization shows a good behaviour in North-Western Mediterranean open-sea deep convection situation (Somot et al. 2006;Herrmann et al. , 2010Beuvier et al. 2012;L'Hévéder et al. 2013).
The TRIP river routing model (Decharme et al. 2010) is used here to convert the simulated runoff by the ISBA land surface scheme into river discharges using a river channel network at 0.5 • resolution that covers the whole catchment basin of the Mediterranean Sea and Black Sea rivers. All rivers are fully coupled except for the Nile for which 12-monthly mean climatological values corresponding to post-Aswan dam situation are used (two mouths, mean annual value equal to 875 m 3 /s). Coupling between all the different components is achieved at a daily frequency : airsea fluxes computed in ALADIN and river discharges at the river mouths computed in TRIP are sent to the ocean model NEMOMED8, grid-point river discharges from the ISBA land-surface model are sent to the river module TRIP and SST from NEMOMED8 is sent to the atmosphere model ALADIN. In addition, the runoffs flowing into the Black Sea are added to the precipitation-evaporation budget of the Black Sea and sent to NEMOMED8.

Description of the long-term hindcast simulation
Using CNRM-RCSM4, a multi-decadal hindcast simulation (1980-2013) is carried out. External forcings are the followings: the global atmosphere reanalysis ERA-Interim (Dee et al. 2011) used as lateral boundary conditions for ALADIN updated every 6 h, the global ocean reanalysis NEMOVAR-COMBINE (Balmaseda et al. 2010) used in the NEMOMED8 Atlantic buffer zone to provide temperature, salinity and sea level information updated every month (see Sevault et al. 2014, for more information), the greenhouse gas concentrations following observed values and the aerosol concentrations using the Tegen aerosol climatology (Tegen et al. 1997). The period chosen is the longest ERA-Interim period available at the time of the study. To ensure a good temporal chronology at the synoptic scale for the atmosphere, we apply the spectral nudging technique that allows a better control of the large-scales inside the ALADIN domain by ERA-Interim (Colin et al. 2010;Herrmann et al. 2011).
Initial conditions for the ocean model have been chosen carefully in order to represent, as much as possible, the known state of the water masses before the 1980s. More precisely, we combine the 12-month climatology from MEDAR/MEDATLAS Group (2002) and the 10 year-filtered interannual dataset from Rixen et al. (2005) to reconstruct a typical month of July in the 1960s. In this initial condition file, the bottom layers (2300 m) of the GoL area defined above have the following mean characteristics: 12.7 • C, 38.41 and 29.10 kg/m 3 with nearly no spatial variation over this area. Those values are in agreement with historical values reported in the literature (12.7-12.9 • C, 38.41-38.46, 29.09-29.10 kg/m 3 ; Mertens and Schott 1998) for the period before the Western Mediterranean Transition starting during winter 2005 (Schroeder et al. 2008. Note however that model initial conditions are on the cold and fresh side of the observed range. For this initial condition file, the volume of water denser than 29.10 kg/m 3 over the NWMED area is equal to 0.7 × 10 14 m 3 that is to say 17 % of the total volume of the area. To ensure the quasi-stability of the model run, we performed a dedicated 26-year long spin-up before the start of the run [see details in Sevault et al. (2014)]. We are aware that this spin-up may be too short to reach an equilibrium state of the model run especially for the deep water masses due their longer renewable time. At the end of the spin-up, the bottom water masses of the GoL area (spatial average at 2300 m) are denser than in the initial conditions due to a salinity increase and they have the following characteristics: 12.7 • C, 38.42 and 29.11 kg/m 3 . At the scale of the GoL area, the spatial distribution remains nearly homogeneous. In addition, the volume of water denser than 29.10 kg/m 3 has increased to reach 1.8 × 10 14 m 3 at the end of the spinup, that is to say just before the beginning of the hindcast run (44 % of the total volume of the NWMED area).

Observation-based indicators
We present here the various indicators based on in-situ and satellite observations that will allow to characterize the climate variability of the DWF phenomenon. The goal is not to address all the specific processes involved in the opensea deep convection but to obtain quantitative aggregated indicators comparable with the model outputs. Some are available over the whole simulation period and the whole NWMED area but others are limited to the zone close to the LION point and only for the most recent years.

Yearly maximum mixed layer depth in winter
This indicator can be considered as the most classical one to estimate if a year is convective or not and to evaluate model simulations (Mertens and Schott 1998;Béranger et al. 2010;L'Hévéder et al. 2013). Here we try to estimate the maximum Mixed Layer Depth (MLD) during a given winter in the NWMED zone. This information can come from different sources, described below from the more robust to the less robust: Using this ∆T1 first criterion, a MLD was calculated for the first 300 m of the upper water column. • a second criterion was required to define a more precise MLD below 300 m and using the deep mooring data. After performing sensibility tests for different temperature criteria and regarding the accuracy of the temperature sensors used on the LION mooring, we defined a ∆T2 = 0.01 • C criterion and a reference level at 310 m.
If the MLD calculated with the ∆T1 criterion is deeper than 300 m, then the second criterion ∆T2 is used to define the MLD, otherwise the MLD is calculated only with the ∆T1 criterion. This double temperature criterion allows to get the best estimate of the MLD using the combination of the surface buoy and deep mooring data available over the period from September 2007 to July 2013. In addition, it was consistently checked against classical MLD estimates carried out with profiles from profiling floats, gliders and R/Vs in the vicinity of the mooring. This indicator corresponds to method a in Table 1.
• from the change in bottom water mass characteristics before and after a given winter. This criterion is based on an expert analysis of pairs of vertical profiles taken before and after a given winter at approximately the same location. Using the temperature, salinity and density time-series from the LION mooring, Houpert et al. (in revision) show that, after each event of bottomreaching deep convection, the thermohaline characteristics of the bottom water undergo a step which is due to the fact that the mixed layer reaches the bottom with a slightly different temperature and salinity. They also show that if the deep convection reached the bottom, the changes in the deep water characteristics can clearly be identified on CTD stations carried out the year after in the Gulf of Lions. By comparing the θ-S profiles before and after a specific winter, one can identify a year of bottom-reaching deep convection thanks to the apparition of a θ-S anomaly in the bottom layer. For each year, if we can detect strong anomalies in θ-S characteristics of the bottom water, then we set the maximum MLD to 2500 m (method b in Table 1). The interest of this method is to identify years of deep convection when no data were available to characterize the wintertime maximum MLD. • from the literature (Mertens and Schott 1998), we obtain values for the years 1982, 1987 and 1992 (method c in Table 1). • from a mixed layer depth criterion applied to observed vertical profiles available during the winter months (January, February, March) over the NWMED area. In order to exploit as much as possible the available profiles and after sensitivity tests on recent years, we apply a ∆T criterion equal to 0.1 • C or a ∆ρ equal to 0.02 kg/m 3 (when the salinity is available) between the surface and the base of the MLD both criteria being nearly equivalent. Note that this indicator requires to have insitu observations during the winter months and probably always leads to underestimate the actual maximum MLD due to space and time undersampling but it allows to obtain values for nearly every year of the model period. We decide to keep a value only if at least 5 vertical profiles are available for a given winter. The database of in-situ vertical profiles is the same as the one used in Houpert et al. (2015). The yearly maximum is kept in Table 1 (method d).

Yearly maximum extension of the deep convection surface
The maximum spatial extent of the deep convection is a quantitative estimate of the DWF as it is likely that a large deep convection surface leads to a large volume of newly formed DW. In the observations, this indicator can be estimated using two different and independent methods: • the maximum extension of the zone of minimum chlorophyll-a surface concentration in winter before the spring bloom. This zone corresponds to a deep mixing zone because the minimum concentration patch is due to a dilution effect of the chlorophyll-a over the mixed depth (Auger et al. 2014): the deeper the mixing, the lower the chlorophyll-a surface concentration. Chlorophyll-a images can be retrieved using remote sensing products when cloud cover is low over the zone. As the North-Western Mediterranean Sea is a cloudy area in winter, this method does not allow to achieve a continuous monitoring of the convection activity but allows to estimate a winter maximum extension of the convective surface using all available images. This convection indicator was already used for example in Herrmann et al. (2010) and Durrieu de Madron et al. (2013). The difficulty is to set a concentration threshold to define the limit of the minimum concentration zone that is to say to set a threshold that could correspond to a given mixing depth. For example, Durrieu de Madron et al.
(2013) use a minimum chlorophyll-a concentration of 0.12 mg/m 3 for winter 2011-2012 to obtain a maximum extension equal to 1.55 × 10 10 km 2 . Here we use two thresholds to take this uncertainty into account (method e and f in Table 1) following Houpert et al. (in revision).
Using glider data and satellite images, they defined a minimum (0.15 mg/m 3 ) and a maximum threshold (0.25 mg/m 3 ) corresponding to the transition between the mixed conditions characterized by low chlorophylla and the stratified conditions characterized by higher For the yearly maximum MLD, method a uses LION surface and deep moorings, b is based on θ-S anomalies between 2 consecutive years, c corresponds to Mertens and Schott (1998) values, d is based on ∆T and ∆ρ criteria applied on vertical profiles from Houpert et al. (2015). For the yearly maximum convective surface, e, f correspond to minimum and maximum criteria using the satellite data, g, h to minimum and maximum criteria using in-situ data. For the volume of DW denser than 29.10 kg/m 3 , method i uses the Rixen et al. (2005) interannual gridded product ( chlorophyll-a. In this transition area, the MLD estimated by the glider reached 1000 m or more. • when deep convection occurs in the Gulf of Lions, the warm and salty LIW layer is vertically mixed and cooled down. The mean temperature of the 400-600 m layer is hence a good indicator to discriminate whether the water column once underwent deep mixing or not. To estimate this parameter, we use all the in-situ data collected by ships, Argo floats and gliders for the 2007-2013 period after an intercalibration procedure presented in Bosse et al. (2015). For winters 2007, 2008, 2011, 2012 and 2013, the data collected during the whole winter period (January to March) and binned into a 10x10 km grid cover more than 2/3 of the Gulf of Lions region. This enables to objectively map this subsurface variable over the deep convection zone. Observed values below 13 • C (resp. 12.95 • C) are associated with mixed layer greater than 1000 m (resp. 2000 m). Thus, those thresholds define bounds for the maximal extension of the deep convection surface (see Bosse 2015, for more details). Values correspond to methods g and h in Table 1.

Deep water volume and yearly DWF rate
The most quantitative way to estimate DWF is probably to follow the volume of the deep water for a given density threshold. This is very informative in models Herrmann et al. 2010) but is quite tricky to compute with in-situ observations as synoptic and dense networks of deep and well-intercalibrated CTD casts are hardly available.
• achieving such a network is one of the goal of the MOOSE programme with the so-called MOOSE-GE field campaign that occurs every summer in the North-Western Mediterranean area since 2010 (http://www. moose-network.fr, Testor et al. 2012Testor et al. , 2013. Waldman et al. (2016) use these dense field campaigns to estimate DW volumes and associated uncertainties for various density thresholds (29.10, 29.11 and 29.12 kg/m 3 ) for summer 2012 and summer 2013 for which enough CTD casts were acquired. Note that, due to observation availability constraints, their domain of study is defined by 2.5 • E-9 • E; 40 • N and a bathymetry deeper than 2000 m for a total volume of 3.3 × 10 14 m 3 and is therefore notably smaller than the NWMED domain. We use mostly here the 29.10 kg/m 3 threshold (method j in Table 1) as it corresponds to DW in the 1980s and allows to follow the DW along the whole simulation. In addition to the MOOSE-GE field campaigns, other similar networks were carried out in the frame of the DeWEX-Leg1 and DeWEX-Leg2 field campaigns for the months of Feb-ruary 2013 and April 2013 (Testor 2013;Conan 2013;Taillandier 2014). These additionnal CTD cast networks allow to follow the seasonal cycle of the DW volume for this specific year and to determine a DWF rate for winter 2012-2013 by contrasting the April 2013 and the August 2012 networks (Waldman et al. 2016). The DeWEX-Leg2 network in April 2013 is considered as the annual maximum DW volume just after the convection ceases whereas the MOOSE-GE2012 network in August 2012 is considered as a good approximation of the annual minimum DW volume before convection. The DWF rate obtained is equal to 0.9 ± 0.4 Sv for waters denser than 29.10 kg/m 3 , 1.4 ± 0.3 Sv for 29.11 kg/m 3 and 0.2 Sv for 29.12 kg/m 3 . Waldman et al. (2016) also provides an extrapolated estimate of the DWF rate at 29.11 kg/m 3 obtained for the North-Western Mediterranean Sea (domain close to our NWMED domain) and with the optimal minimum and maximum DW volumes. The extrapolated value is equal to 2.3 ± 0.5 Sv, that is significantly larger than the initial rate. • a second way to estimate DW volume is to rely on highresolution gridded 3D analysis of density. To our knowledge, most of the Mediterranean gridded products are representative of multi-annual periods (e.g. SeaDataNet, Schaap and Lowry 2010; MEDAR/MEDATLAS Group 2002) and can not allow to follow DW volume interannual variability. Here we use the 10-year filtered product detailed in Rixen et al. (2005) that propose a temporally evolving 3D analysis of the Mediterranean water masses that can be used to compute a DW volume representative of a given period of time (see method i in Table 1).
Note that model-based ocean reanalyses could have been another option but their deep water mass evolution is not yet reliable (Pinardi et al. 2013;Hamon et al. 2016), probably due to spurious effects of the assimilation scheme.

Deep water characteristics
The characteristics of the bottom water masses in the convection area allow to assess the trends in the WMDW formation zone. Three methods are explored to obtain data representative of the LION location.
• we first assess the evolution of past bottom water masses in the GoL zone using a merged historical database presented in Houpert et al. (2015) by extracting all the measurements done at 2300 m and in a circle of 100 km around the LION mooring location. Values are reported as method k in Table 1. If the scarcity of the oceanographic cruises in the 1980s and the 1990s didn't allow to characterize the interannual variability of the θ-S characteristics of the deep water, this data-base is essential to describe the long-term evolution of the deep waters and the possible shifts (like for example after the very intense event of deep convection in 2005/2006) in the DW characteristics often described as following a linear trend (Béthoux et al. 1990;Rixen et al. 2005). We are aware that a clean homogenization work is nearly impossible for this long-term time series due to the sparsity of the measurements and the variety of the data sources. Consequently results must be carefully interpreted. • data from one of the HYDROCHANGES deep moorings (Schroeder et al. 2013). Initiated in 2003, the HYDROCHANGES programme aims at recording time series of the hydrological characteristics of the water masses and of their variability in key places (straits, DWF areas, trenches, . . . ) of the Mediterranean, with short and light moorings equipped with a high-precision and high-stability CTD probe (Seabird SBE37) located a few meters above the seafloor. The HYDRO-CHANGES LION mooring (HC-LION, see Table 2 Table 1. • data from the LION deep mooring (see above) that monitors continuously the water column at the LION point from 2007. Monthly-mean values from the deepest sensor of the mooring are used (method m in Table 1).

Stratification index
The pre-winter water column vertical stratification can be quantified by a stratification index depending on the time and computed from the sea surface to a maximum depth of integration. To compute the stratification index, we follow here the Turner (1973) equation, already used in many Mediterranean studies (Lascaratos and Nittis 1998;Somot 2005;Herrmann et al. , 2010L'Hévéder et al. 2013): with N the Brunt-Väisälä frequency given by: where t is the time in seconds, z the depth in meters, ρ the potential density in kg/m 3 , g the gravitational acceleration (9.81 m/s 2 ), H the maximum depth of integration in meter. (1) The stratification index is expressed with the same unit as a temporally-integrated buoyancy loss (m 2 /s 2 ), the larger this index, the stronger the vertical stratification. This index is very difficult to estimate from the observations for a given location as it requires long-term well sampled vertical profiles of temperature and salinity with high precision measurements. Recently, estimates of the index and of the related sources of error were obtained over the 2007-2013 period in interpolating data from the LION surface buoy and deep mooring (Houpert et al. in revision). Before November 2011, the stratification due to the salinity in the first 200 m is unknown because of the absence of a conductivity sensor at the sea surface. To tackle this problem, we use a constant value corresponding to the shallowest salinity measurement of the mooring (at 170 m depth). To evaluate the error due to this approximation, we used independent hydrographic profiles collected in a 30 km radius from the mooring (from glider and R/V). The same independent hydrographic profiles were used to estimate also the sampling error due to the low vertical resolution of the mooring. Another potential source of error can come from biases in vertical density gradients induced by the intercalibration of the different instruments. As we estimate an error in the calculation of the potential density less than 0.005 kg/m 3 (due essentially to the calibration of the conductivity sensors), we propagated this error in the calculation of stratification index. For an integration to 1000 m depth, the error due to the accuracy of the intercalibration of the instruments, represents between 13 and 34 % of the total error (constant salinity + vertical discretization + intercalibration). However, if we use a stratification index integrated up to 2300 m, the errors due to the intercalibration of the instruments become very large (76-92 % of the total error).
In the current study, we therefore decide to use a reference depth of 1000 m for the computation of the stratification index from the LION surface buoy and deep mooring in the observations. For reasons explained later, we estimate the stratification index on December 1st by averaging the daily values between November 15th and December 15th for each available year (method n in Table 1, note that values are indicated with a 1 year shift in the table that is to say that the value computed for December 1st 2012 is indicated for the year 2013).

Definition of the daily weather regimes
There are several approaches to identify weather patterns in order to characterize the large-scale atmospheric circulation. In this work, the Weather Regimes (WR hereinafter) approach is used (Vautard 1990). The WR can be defined as the preferential states of the atmospheric circulation, characterized by various properties such as persistence, recurrence and stationarity. WR are usually obtained from classification techniques or cluster analysis, in which a large number of geopotential daily maps are organized into a few groups or classes. Previous works based on reanalysis and observed data (Robertson and Ghil 1999;Plaut and Simonnet 2001;Yiou and Nogaj 2004;Sanchez-Gomez and Terray 2005) have established links between WR and local extreme episodes of temperature and precipitation over different regions of the Northern Hemisphere. These links have been also identified in modelling studies (Cattiaux et al. 2013). In this work we use the WR approach to identify the main atmospheric circulation patterns in the North Atlantic sector. On this area, four WR have been previously identified in both winter (Vautard 1990), and summer seasons (Cassou et al. 2005). Here, we have obtained these four WR using the 500 hPa geopotential height (Z500) from the ERA-Interim reanalysis over 1980-2013 period. The decomposition of the large-scale flow has been performed using the k-means clustering algorithm (Michelangeli et al. 1995). Before the classification, we conduct an empirical orthogonal function (EOF) analysis of the daily anomalies of Z500 maps for the winter season, defined here from December to March. The first 10 EOFs have been retained, capturing about 90 % of the total variance and k-means clustering is applied in the space spanned by the leading principal components. The resulting WR are the positive and negative phases of the North Atlantic Oscillation (NAO+ and NAO−), occurring on average 39 and 25 days per winter respectively; the Atlantic Ridge (AR) characterized by an anomalous anticyclonic core in the centre of the North Atlantic basin (24 days per winter); and the Blocking (BK) pattern (33 days per winter), represented by an intense anticyclonic cell centred over the Scandinavian Peninsula. Note that the classification technique to obtain WR is quite different from a linear approach (i.e. EOFs), in which positive and negative states of the same leading pattern have the same probability to occur. In the current case, for example, NAO+ and NAO− patterns and their associated impact are not necessarily symmetric. The ability of the atmospheric component of the regional model CNRM-RCSM4 to capture the North Atlantic WR of the driving reanalysis has been previously studied in Sanchez-Gomez et al. (2009). They show that ALADIN-Climate captures correctly these large-scale patterns present in the reanalysis.

Mean behaviour in the North-Western Mediterranean Sea
Two articles have been devoted to the overall evaluation of the CNRM-RCSM4 model in the hindcast configuration, one dealing mainly with the evaluation of the atmosphere component  and the other with the evaluation of the air-sea fluxes, of the river discharges and of the ocean behaviour at the Mediterranean Sea scale (Sevault et al. 2014). Note that Sevault et al. (2014) used exactly the same simulation as we do whereas Nabat et al. (2015) used the model without spectral nudging and with a new aerosol climatology. We verified that the main atmospheric model biases over land are similar in the model version used in our study and in the C-AER simulation presented in Nabat et al. (2015). We don't repeat here the results of this overall evaluation but we describe, in this section, the mean behaviour of the model for the DWF representation in the North-Western Mediterranean Sea for the period 1980-2013 that is to say from winter 1980-1981 to winter 2012-2013 (33 full winters). Hereinafter, winters will be numbered by the year of the month of January, that is to say that the winter  Figure 2b shows the map of the 1981-2013 mean MLD for the month of February to illustrate the mean behaviour of the extension of the convection zone. It shows that the mean shape of the convective area is centered near the LION point with a local maximum reaching more than 800 m. It is limited at its northern boundary by the Northern Current and by the Northern Balearic front at its southern boundary. Similar shapes were obtained in previous studies (Marshall and Schott 1999;Beuvier et al. 2010;Herrmann et al. 2010;Béranger et al. 2010;L'Hévéder et al. 2013). Figure 2c shows a map of the number of days with a MLD thicker than 1000 m over the total period of the run. It therefore confirms the favored convection area near the LION location with a maximum value of 600 days with deep MLD there, that is to say 18 days/year on average. The comparison of Fig. 2b, c confirms that the convection is very variable in time as no point in Fig. 2b shows a MLD deeper than 1000 m whereas Fig. 2c shows a large number of points in which MLD deeper than 1000 m can occur at daily time scale. Figure 2c also shows that the extension of the convection area can come sometimes close to Spain and to the Menorca Island such as in winter 2004-2005 (see Fig. 2d) but that deep convection never occurs North of 42.5 • N or East of 7 • E. In particular, in the model, extension towards the Ligurian Sea is never simulated contrary to observations, probably because of the overestimation of the stratification in this area (see below). The winter with the maximal extension zone (MLD >1000 m) is 2004-2005 with 4.4 × 10 10 m 2 and a maximum occurrence of MLD > 1000 m equal to 43 days near the LION location as shown in Fig. 2d. Figure 3a is a volumic θ-S diagram in the GoL area for the whole simulated period. The colors show the percentage of the volume of a given θ-S class with respect to the total volume of the GoL in order to highlight the most populated density classes. The scale focuses on the warm and saline LIW and on the dense WMDW. The Winter Intermediate Water (WIW) can also be identified as a cold intermediate water above the LIW. However the Atlantic Water (AW) is not represented here. On average over the period 1980-2012, the most populated classes of WMDW have the following characteristics: 38.42-38.44, 12.7-12.8 • C, 29.11-29.10 kg/m 3 whereas the LIW are characterized by 38.5-38.54, 13.1-13.3 • C, 29.07-29.09kg/m 3 . Those values are in agreement with the literature (see an overview in Mertens and Schott 1998;Herrmann et al. 2010) and with the values given in Table 1 could lead to the overestimation of the convection in the central zone and the underestimation of its lateral extension. In particular this spatial bias may explain why convection is never observed in the Ligurian Sea in the model (see Fig. 2c).
In conclusion, in the North-Western Mediterranean Sea, the simulation shows the expected mean behaviour concerning the MLD seasonal variability, the geographical area of the open-sea deep convection, the deep water mass characteristics and the vertical stratification. In particular it is worth noting that the simulated characteristics of the WMDW allow to keep a realistic 29.10 kg/m 3 threshold in the following analyses to define the deep water mass and related DWF. The main identified weakness is the missing Ligurian Sea extension probably due to an overestimation of the stratification in this sub-basin.  Table 1. For the model, temporal and spatial maximum MLD values are computed using model daily outputs for each winter over the GoL area. We use two classical criteria of NEMO to define the MLD: the pycnocline criterion with the surface density used as reference and ∆ρ = 0.01kg.m −3 used as a threshold to define the bottom of the mixed layer, and the turbocline criterion using K z = 5 × 10 −4 m 2 /s as a threshold for the bottom of the mixed layer. Despite a density threshold adapted to the Mediterranean Sea, the pycnocline criterion always gives deeper MLD than the turbocline criterion in NEMOMED8 for the GoL area (+420 m on average over 1981-2013) and overestimates the visual estimation of the model MLD when checked for some case studies (not shown). Consequently, hereinafter, modelled MLD will refer to the MLD based on the turbocline criterion and all values will be given for this criterion. Note however that both interannual time series are well correlated (0.89). Besides, the maximum MLD values do not change much if we compute them over the larger NWMED area instead of the GoL area showing that the GoL zone is well chosen (mean difference of 20 m, temporal correlation 0.999). The GoL maximum MLD is also very well correlated with the yearly-maximum MLD computed at the LION point (correlation equal to 0.98) showing that this point (close to the in-situ surface buoy and deep moorings) is representative of the interannual variability of the whole zone at least in the model and was therefore well chosen to put the in-situ observations. Figure 5 shows that the model simulates a large mean value (1684 m) for the yearly maximum MLD with years with shallow, intermediate, deep or bottom convection as expected from the literature (e.g. Mertens and Schott 1998) and from the observation-based indicators. The model shows deep convection (defined by a maximum MLD thicker than 1000 m) for 22 years out of the 33 simulated winters, more often than the observed indicators (52 % of the observed years). The 1000 m limit to define deep convection has also been chosen by previous authors (Somot et al. 2006;Herrmann et al. 2010). It corresponds to the maximum depth of the LIW in the area and to the lower limit of the strong vertical stratification values as shown in Fig. 4. Indeed, when the MLD reaches the 1000 m depth, it can then quickly reach the bottom as only weak stratification remains below this level. The simulation does not show any significant trend for this parameter but a strong interannual and decadal variability (interannual standard deviation of 963m). Well-known large convective years are reproduced such as 1987, 1992, 2005(Leaman and Schott 1991Schott et al. 1996;Schroeder et al. 2008;Durrieu de Madron et al. 2013) even if 1992 is only detected using the pycnocline criterion. In addition, the 1990s appear as a decade with a weak convective activity whereas the 1980s and the recent years show a series of strong convective years. In particular, the model simulates bottom convection during 5 consecutive years between 2009 and 2013 as observed and 4 consecutive years between 1984 and 1987. The decadal variability of the maximum MLD is in good agreement with previous studies which also identify the 1990s as a period of weak activity and the 1980s as a period of strong activity (Sannino et al. 2009;Herrmann et al. 2010;L'Hévéder et al. 2013). Keeping in mind the large uncertainties related to the observed estimates, the match between the model and the observation-based indicator available for 31 different years is relatively good. Note however that some years show very large biases such as 1985, 1988, 1996 or 2004. The strong spatio-temporal undersampling of the observations, especially during wintertime, can lead to an underestimation of the true maximum MLD in method d, probably explaining some of the larger mismatches (also see the Discussion section). Over the 31 years with observed values, the model (turbocline criterion) shows a negative mean bias (−224 m) as expected due to the observation undersampling, a good standard deviation (963 m in the model, 1098 m in the observations) and a largely significant temporal correlation (0.65, significant at the 99 % confidence level). To our knowledge, this is the first time that a model simulation is evaluated with so many observed years thanks to a thorough reanalysis of the past observations and to the improved sampling of the recent years. We also verified that there is no significant autocorrelation within the modelled MLD time series itself (maximum autocorrelation = 0.23 with a 1-year lag) showing that having a deep MLD one winter is not a preconditionning for the convection one year later.

Yearly maximum extension of the convective area and DWF rate
The MLD and its temporal and geographical maximum value is often used as the only parameter to characterize and evaluate the interannual variability of the DWF (Sannino et al. 2009;Herrmann et al. 2010;Beuvier et al. 2012;L'Hévéder et al. 2013). However this diagnostic only gives a qualitative information concerning the phenomenon and does not allow to quantify it or to classify years for which the MLD reached the bottom. Using the model outputs and observation-based estimates for the recent years, this quantification can be done for the yearly maximum extension of the convective area computed over the NWMED area (Table 1; Fig. 6) and for the DWF rate computed over the same area and with a 29.10 kg/m 3 threshold to define the dense waters (Table 1; Fig. 7, black bars and black circles). The yearly maximum surface of the convective area (in m 2 ) is computed using model daily outputs and two depth criteria for the MLD (deeper than 600 m and deeper than 1000 m). Besides, the DWF rate (Sv) is computed every year as in previous studies Beuvier et al. 2010;Herrmann et al. 2010;L'Hévéder et al. 2013;Sevault et al. 2014): first we compute the difference between the maximum volume of waters denser than 29.10 kg/m 3 for a given year minus the minimum volume of the same water class for the previous year. Those dense water (DW) volumes are computed using monthly-mean 3D model outputs of the potential density. Maximum volume is often in spring just after the convection whereas minimum volume is often in late autumn or in winter just before the start of the convection (see also Herrmann et al. 2010;Waldman et al. 2016). This difference of volume (m 3 ) is then divided by 10 6 and by the number of seconds in a year to obtain yearly DWF rate expressed in Sv. It is worth noting that using monthly-mean files instead of dailymean files to estimate the DWF rate may lead to underestimate DWF rate as the minimum and maximum volumes can be larger with daily files. We tested this difference for the 7 winters between 2006 and 2013 for which daily 3D outputs are available. As expected, the DWF rate is always higher with daily files with a mean difference of +0.03 Sv (+8 %) and a maximum difference of 0.1 Sv (+23 %) reached in 2012. The strong interannual variability of the DWF phenomenon is confirmed by Fig. 6 (mean value: 1.1 × 10 10 m 2 and std: 1.2 × 10 10 m 2 for a MLD deeper than 1000 m) and Fig. 7 (mean value: 0.28 Sv, std: 0.36 Sv) with years showing no DWF. A list of 5 years (1981,1999,2005,2009,2013) can be defined for which DWF can be considered as very strong with a maximum convective surface larger than 2 × 10 10 m 2 and a DWF rate above 0.6 Sv. Using those criteria, the most convective winter is 2004-2005 with a DWF rate of 1.2 Sv (stronger than the average plus 2 times the standard deviation) and a maximum convective surface reaching 4.4 × 10 10 m 2 (resp. 5.7 × 10 10 m 2 ) for a MLD deeper than 1000 m (resp. 600 m). This winter has been already identified as exceptional in the literature (e.g. Schroeder et al. , 2008Herrmann et al. 2010;Beuvier et al. 2012).  Tziperman and Speer (1994). The values obtained with the model CNRM-RCSM4 are close to those published for similar long-term simulations and using similar diagnostics: Castellari et al. (2000) show DWF rate values ranging from 0.01 to 1.6 Sv on average over the 1980-1988 period, but strongly depending on the surface forcing applied and of the salinity ad-hoc corrections. For example, they obtain a very small value (0.02 Sv) when using daily fluxes without salinity ad-hoc correction. Beuvier  However, despite a simulated bottom convection, our model seems to underestimate the convective activity of winter 1987 (DWF rate equal to 0.1 Sv) with respect to previous studies (Castellari et al. 2000;L'Hévéder et al. 2013). Note that the way to compute the DWF rate in the current study is close to the one used in Lascaratos et al. (1993), Castellari et al. (2000) or Durrieu de Madron et al. (2013) as it is based on the volume of the newly-formed dense water for a given winter. However it is different from the one used in Walin (1982), Lascaratos (1993), Tziperman and Speer (1994) or Somot et al. (2006) which is only based on the DW produced by the surface air-sea fluxes, that is to say without taking into account the ocean mixing. DWF rate following the Tziperman and Speer (1994) method are always larger than using the Lascaratos et al. (1993) method (see , Waldman et al. (2016), for a discussion on this issue] and are not easily comparable.
Besides, we also found that there is no significant autocorrelation within the modelled DWF rate time series itself. This shows that having a strong DWF rate for a given winter does not favour strong convection one year later.

Understanding of the interannual variability of the DWF
In the previous sections, we demonstrated that the CNRM-RCSM4 simulation is able to reproduce well the seasonal timing, the geography, the intensity and the interannual variability of the DWF phenomenon over the 1980-2013 period (33 DJFM winters). In the following, we make the assumption that the model performs well for the right physical reasons and we use the model outputs to try to improve the understanding of the interannual variability of the DWF phenomenon. The goal is to identify the main driving factors and their relative contribution.
Concerning the interannual variability, our study partly revisits but also extends the results obtained by Herrmann et al. (2010) and L'Hévéder et al. (2013) using a different modelling framework, a more recent period and with a more stable and more in-depth evaluated simulation. In addition, we also investigate the role of the daily scale in the air-sea fluxes and we try to give new insights concerning the driving factors.

Role of the winter air-sea fluxes
As already underlined by many studies (see the introduction), one of the major drivers of the DWF interannual variability is the air-sea fluxes accumulated over winter over the area of interest. For every year Y of the 1981-2013 period, we compute the integrated buoyancy loss cumulated over an extended 4-month winter from December 1st of year Y − 1 (T 1) to March 31st of year Y (T 2) and averaged over the GoL area following Marshall and Schott (1999). This index will be called BL hereinafter.
with where t is the time in seconds, B the surface buoyancy flux in m 2 /s 3 , HF the net surface heat flux in W/m 2 , WF the net surface water flux in m/s, SSS the sea surface salinity, C p the specific heat capacity (equal to 4000 J · kg −1 · K −1 in NEMOMED8), ρ 0 the reference density (equal to 1020 kg · m −3 in NEMOMED8), g the gravitational acceleration (9.81 m/s 2 ), and α and β the thermal and saline expansion coefficients (respectively equal to 2 × 10 −4 • C −1 and 7.6 × 10 −4 in NEMOMED8). HF and WF are counted positive downward. As they are mostly negative (heat loss and water loss) for this area and this period of the year, BL is always positive. The interannual time series of BL computed over the GoL area and of its heat-related and water-related contributions are plotted in Fig. 8. BL has a mean value of 0.76 m 2 /s 2 , an interannual standard deviation of 0.20 m 2 /s 2 and no significant trend. The mean value and the interannual variability of the buoyancy loss are dominated by the heatrelated term (heat-term, mean: 0.67 m 2 /s 2 , std: 0.18 m 2 /s 2 , correlation of 0.996 with BL). The 5 years identified as strong convective years (1981,1999,2005,2009,2013) occur during years with above-normal buoyancy losses (BL > 0.9 m 2 /s 2 ) but other years can also be identified by their strong buoyancy loss such as 1985, 1987, 2010 and 2012. The 1990s appear as a low BL period in particular with 2 years below 0.4 m 2 /s 2 (1990 and 1997).
The winter cumulated buoyancy loss over the North-Western Mediterranean Sea is not spatially homogeneous. With respect to the BL index (averaged over the GoL area), it is for example larger at the LION point (mean: 0.85, std: 0.25 m 2 /s 2 ) and weaker when averaged over the NWMED area (mean: 0.63, std: 0.14 m 2 /s 2 ). However, the temporal correlations between BL and the LION time series  Figure 9 shows the interannual relationship between the BL index in x-axis and the yearly maximum MLD (Fig. 9a), the yearly maximum convective surface (Fig. 9b) and the DWF rate (Fig. 9c) as previously defined. The interannual correlations are significant at the 99 % confidence level for three parameters characterizing the DWF phenomenon with the respective values of −0.75, 0.80 and 0.69. In addition, the scatterplots show that deep convection and DWF are not possible with BL below 0.6 m 2 /s 2 as the maximum MLD remains thinner than 1000 m (the approximate depth of the 29.10 isopycne) and the convective surface and DWF rate remain negligible. With a BL above 0.9 m 2 /s 2 , bottom convection (maximum MLD above 2000 m) always occurs and DWF rate always exceeds 0.4 Sv except for year 1987. It is however clear that the BL index is not the only factor explaining the DWF strength, as strong or weak DWF rates can occur with similar BL.
It is worth mentioning that the BL index is impossible to evaluate with the state-of-the-art observations or reanalyses. Flux reconstructions based on observations and allowing to compute all the components of the buoyancy loss are nearly inexistent and limited to local buoys with difficulties to estimate their representativeness at the GoL scale. In addition, the current generation of global model reanalyses are known to produce air-sea fluxes at the Mediterranean Sea scale that are inconsistent with the Gibraltar Strait heat and water contraints (Sanchez-Gomez et al. 2011;Hamon et al. 2016) and can therefore not be used as references at the sub-basin scale as illustrated in .

Role of the air-sea flux daily variability
We now investigate how the daily variability of the air-sea fluxes can influence the DWF phenomenon. First we compute the daily buoyancy loss averaged over the GoL area for every days of the DJFM winter period. Then we classify the winter days in classes depending on their buoyancy loss and we count the number of days with buoyancy loss above a given threshold (see Fig. 10). Five different thresholds are applied: 0.01, 0.02, 0.03, 0.04 and 0.05 m 2 /s 2 . They give 1980-2013 averaged values of 28.0, 9.4, 2.5, 0.55 and 0.15 days/winter respectively with strong interannual standard deviations (8.8, 5.5, 2.0, 0.71, 0.44 days/winter). For sake of simplicity, we keep hereinafter only the number of winter days with a daily buoyancy loss higher than 0.02 m 2 /s 2 and we call them the stormy days. Stormy days represent on average less than 8 % of the winter days. Four out of the five strong convective years (1981,1999,2005,2013) occur during winters with above-normal number of stormy days (resp. 16, 13, 22, 21 stormy days) whereas the fifth year (2009) counts only 9 stormy days. On the contrary, non-convective years show a below-normal number of stormy days. Finally, the temporal correlation between the stormy day occurrence and the BL index is significant at the 99 % confidence level and equal to 0.92 meaning that the BL interannual variability is mostly explained by the stormy day occurrence. Consequently the correlation between the stormy day occurrence and the DWF rate is also largely significant (0.70, significant at the 99 % confidence level) meaning that the stormy day occurrence explains 50 % of the variance of the DWF rate. This confirms, in a realistic simulation framework and over a long period of time, the driving role of a small number of winter storms to set the DWF intensity as already underlined in a pioneer academic study (Madec et al. 1991a). We now try to identify the daily large-scale atmospheric patterns that are in favour of the stormy days and associated strong winter integrated buoyancy losses. For this purpose, we use the daily Weather Regimes (WR) defined in Sect. 2.4. Figure 11 shows the anomaly of the number of winter days for each WR. We choose to use occurrence anomalies in order to underline the winters with above-normal and below-normal WR occurrences. The years (1981,1985,1987,1999,2005,2010,2012,2013) with largest stormy day occurrence (>13 days/winter) and BL index (>0.9 m 2 /s 2 ) always occur during years with above-normal occurrences of the AR (1981,1999,2005,2012) and/ or below-normal occurrence of the NAO− (1981,1985,1987,2005,2010,2012,2013). This includes four of the five strong convective years (1981,1999,2005,2013). The 5th strong convective year (2009) is again an exception with normal occurrences for each WR. For this year with nearly-average conditions, the high BL index is explained by a large number of days with a buoyancy loss above 0.01 m 2 /s 2 instead of above 0.02 m 2 /s 2 (see Fig. 10).
The connections between the BL index, the number of stormy days and the WR occurrence are confirmed by the temporal correlations computed in Table 2. The AR weather regime is in favor of the occurrence of stormy days , of high BL index and then of high DWF rates whereas the NAO+ weather regime is unfavourable. The correlations with the BK and NAO− occurrences are not statistically significant. Those results are not surprising as the AR weather regime is responsible for northerly atmospheric flow leading to cold and dry air mass advection over the area of interest. Concerning the NAO+ influence, this weather regime is associated with westerly or south-westerly atmospheric flows relatively warm and humid and an anticyclonic situation over the Mediterranean area, not favorable to northerly strong winds.  Table 2). This means that the winters with a large number of AR days and a small number of NAO+ days are the most in favor of strong BL index and of strong DWF in the Gulf of Lions. This is the case for example for years 1981, 1996, 2000, 2004, 2005, 2006, 2012 and 2013. This combined index explains around 40 % of the total interannual variance of the time series of the BL index and of the stormy day occurrence.
To verify the role of each WR, we compute composites of the total daily-mean buoyancy loss anomaly for each of the four weather regimes on average over the 33 winters. Figure 12 clearly shows that the AR regime induces positive buoyancy loss anomalies over the whole North-Western Mediterranean Sea with an enhanced pattern in the preferred zone of the Mistral and Tramontane regional winds off the French coast. It is therefore logical that a winter with a high number of AR days leads to high BL index and is prone to large DWF rate. As expected, the We can also note the signature of the Rhone river influence on the haline buoyancy flux in the composites of the NAO+ and NAO− weather regimes. NAO+ (resp. NAO−) leads to less (resp. more) precipitation over the Southern Europe then less (resp. more) river runoff discharges and an associated positive (resp. negative) anomaly of buoyancy loss. In the case of NAO+, the positive anomaly of the haline component of the buoyancy loss is able to counterbalance locally the negative anomaly of the thermal component. This may influence the shelf convection variability but probably less the open-sea deep convection we are studying here.

Role of water column preconditioning
In addition to the winter air-sea fluxes, the pre-winter water column preconditioning has also been identified in the literature as a driver of the interannual variability of the DWF (Lascaratos and Nittis 1998;Somot 2005;Herrmann et al. , 2010L'Hévéder et al. 2013). This preconditioning is quantified here by the stratification index defined by Eq. 1 page 12 For every year Y of the period 1981-2013, we compute the stratification index at the date of December 1st preceding the year Y. It means that December 1st 1980 is used to characterize the preconditioning before the winter 1980-1981. The stratification index has the same unit as a time-integrated buoyancy loss and it represents the buoyancy to be removed from the water column to allow mixing to a given depth (H) in the 1D assumption, neglecting buoyancy advection. This index can also be called residual buoyancy. For Fig. 13, we compute it using the bottom as the depth of reference (H is equal to the local depth) and at two different locations, at the LION location (dashed blue line) and on average over the GoL (full blue line). Note that the December 1st value is obtained as the average of the November and December monthly-mean values as only monthly-mean 3D files are available. We evaluated the error made with respect to the true December 1st value of the model for 7 years for which daily data are available for this run: over the GoL area and for these 7 years, the maximum bias is equal to 0.09 m 2 /s 2 with a mean bias of 0.009 m 2 /s 2 , a root mean square error of 0.047 m 2 /s 2 , a ratio of standard deviation of 0.88 and a correlation of 0.97. Using the root mean square error, we then consider that our estimate of the stratification index on December 1st in the model is good at ±0.05 m 2 /s 2 . The bottom stratification index averaged over the GoL area has a mean value of 0.98 m 2 /s 2 and standard deviation of 0.11 m 2 /s 2 . It does not show any significant trend. As expected, the bottom stratification index computed at the LION location displays smaller values (mean: 0.64 m 2 /s 2 ) and a larger variability (0.16 m 2 /s 2 ). We recall  Dotted zones When the values are significantly different from zero with respect to the intra-class standard deviation that this stratification index acts as a barrier to the convection: the stronger the index, the weaker the convection. It means that, following the 1D hypothesis, convection will occur more easily in the central zone of the basin (close the LION point) where a minimum stratification is detected (see also Fig. 4) and a maximum deep MLD occurrence (Fig. 2c). The stratification at the LION point is then a good indicator to determine whether deep convection is likely to occur for a given winter but it is less informative when trying to quantify the volume of the newly-formed dense water. Indeed bottom convection can occur at this location which is nearly the weakest point of the regional stratification without giving indications on the surface of the convected area. The stratification index averaged over the GoL area is more representative of the resistance of the whole zone to deep mixing and will probably be more discriminant to explain the interannual variability of the DWF rate. We note that both time series are relatively well correlated (0.82) even if we can find years where the LION stratification is very close to the GoL stratification (difference < 0.2 m 2 /s 2 for 1986,1988,2004,2012) and years where the LION stratification can be larger than the GoL stratification by more than 0.4 m 2 /s 2 (1982,1990,1992,1999,2006,2007,2011). Note besides that the stratification index computed over the larger NWMED area (not shown) is not very well correlated with the GoL index (0.58).
The bottom stratification index averaged over the GoL area for December 1st is chosen and is simply called SI hereinafter to be compared to the BL index. Concerning the mean value of the SI (0.98 m 2 /s 2 ), the surface layer (0-150 m) containing the AW accounts for 52 % of the mean value whereas the intermediate layer (150-1000 m) containing the LIW and the deep layer (below 1000 m) respectively account for 36 and 12 %. Obtaining an adequate vertical stratification of all the layers of the North-Western Mediterranean Sea is therefore mandatory to simulate adequately open-sea deep convection and DWF in the area. However concerning the interannual variability of the SI, the surface layer explains 86 % of the variance with a correlation of 0.93 and is therefore the main driver of the interannual variability of SI. To go one step further, the interannual variability of the modelled SI is largely explained by the salt content of the surface layer (correlation of −0.68) and by its heat content (correlation of +0.71) knowing that both quantities are decorrelated. A salty (resp. cold) surface layer leads to a less stratified water column, more prone to convection. This confirms results suggested by Sannino et al. (2009) in which more mixing at the Strait of Gibraltar leads to saltier surface waters in the Western Mediterranean Sea and then to enhanced convective activity in the NW Mediterranean Sea.
Determining the controlling factors of the interannual variability of the salt and heat content of the North-Western Mediterranean Sea is a complex issue that mixes many possible contributions and is outside the scope of the study. However this includes at least the interannual signal coming from the near Atlantic Ocean across the Gibraltar Strait (Millot 2007) and the mixing processes at the strait and all along the AW path, for example by the Algerian Current eddies.
Years with high SI (> 1.1 m 2 /s 2 , 1987,1993,2007,2012) and low SI (< 0.9m 2 /s 2 , 1981, 1984, 1996, 1999, 2000, 2009, 2011) can be identified on Fig. 13. They do not always correspond to non-convective years and to strong convective years but they allow to explain unexpected behaviour of some years previously noticed. For example, 1987 and 2012 are among the years with the highest BL of the period (see Figs. 8, 10, 13) but are not in the list of the strong convective years. This is probably because of their strong SI. On the contrary, 1996 and 2000 with a relatively low BL (0.6-0.7 m 2 /s 2 ) are able to produce a large amount of DW (0.4-0.5 Sv) because of their low SI. The SI parameter also explains why 1999 and 2009 are in the list of the strong convective years with a BL equal to 0.9 m 2 /s 2 but not 2012 with the record BL (1.13 m 2 /s 2 ). This relationship between the SI and the DWF phenomenon is however not easily found using time series correlations. Indeed, the SI is not significantly correlated with the yearly maximum MLD times series nor with the yearly maximum convective surface. The only significant correlation (at the 95 % confidence level) is between the SI and the DWF rate with -0.40 showing that strong SI indeed inhibits strong DWF.
It is worth mentioning that the SI parameter is nearly impossible to evaluate with observation-based indicators except for using dense and deep CTD networks around December 1st. Moorings such as the LION facility only allow to compute stratification index estimate at one given location and for the upper 1000 m of the ocean but those values are not representative of the GoL scale as illustrated in Fig. 13. We however use these mooring-derived values to evaluate the 1000 m stratification index of the model at the LION location (see dotted line for the model and blue dots for the observations on Fig. 13). Taking into account the large error bars associated to the observed estimates, the model performs relatively well. In particular, the model and the observations agree on the interannual variability with a stronger stratification in 2012 and relatively weaker values in 2009 and 2011 confirming the results obtained on average over the GoL.

Relative role of BL and IS
In this section, we further investigate the relative role of both indices defined above, on one side the winter (DJFM) integrated buoyancy loss averaged over the GoL noted BL and on the other side December 1st bottom stratification index averaged over the same area and noted SI. Both indices are shown in Fig. 13. On average over the 1980-2013 period, the SI has the same order of magnitude as the BL index for the mean value and for the standard deviation. This means that both indices can oppose to each other to set the mean behaviour of the DWF and its interannual variability. Moreover, on average over the 1980-2013 period, BL is weaker than SI. This means that, on average and when neglecting the lateral advection of lighter waters, the winter air-sea fluxes are not strong enough to allow the whole GoL to mix up to the bottom. For 5 particular years however, BL is stronger than SI. Those years (1981,1999,2005,2009,2013) are exactly the ones defined previously as the strong convective years (DWF rate >0.6 Sv). Note that a DWF rate equal to 0.6 Sv corresponds to the replacement of 16 % (resp. 5 %) of the GoL (resp. NWMED) total volume for a 1-year period. This situation can be related to extremely strong BL values such as in 1981, 2005 and 2013 or to extremely low SI values with strong BL such as in 1999 and 2009. In addition, 7 years are characterized by a BL far below SI (absolute difference greater than 0.4 m 2 /s 2 ): 1989, 1990, 1993, 1997, 1998, 2001, 2007. Those 7 years are all included into the list of the non-convective years and they show a maximum MLD below or equal to 1000 m (see Fig. 5). This means that the combination of both indices is required to explain the DWF intensity, in line with results obtained by Herrmann et al. (2010) and Grignon et al. (2010) Figure 14 clearly shows that for a given BL (resp. SI), the DWF rate decreases (resp. increases) when the SI (resp. BL) increases. We also notice that the range of values is smaller for the SI than for the BL due to a smaller interannual standard deviation. It means that the BL has more capacity to influence the DWF rate than the SI at the interannual time scale.
From our analysis, we can conclude that the combined index (BL-SI) is very discriminant for the DWF intensity. We verify its predictive power in Fig. 15 using scatterplots for the three DWF indicators: the yearly maximum MLD (m), the maximum convective surface (MLD > 1000 m, m 2 ) and the DWF rate (Sv). The interannual variability of the three DWF indicators is very well explained by the BL-SI combined index with respectively 0.85, 0.87 and 0.86 for the temporal correlation (all significant at the 99 % confidence level). This means that it explains more than 72 % of their variance.

Characterization of the trends in the deep water masses
The literature suggests that a warming and saltening trend is on-going in the deep water of the North-Western Mediterranean Sea (e.g. Béthoux et al. 1990;Rixen et al. 2005;Zunino et al. 2009). To check this behaviour in the model, we first compute the monthly-mean time series of the deep water characteristics at 2300 m on average over the GoL area and at the LION point (potential temperature, salinity and potential density) where reference data are available (Fig. 16). The model simulation shows a very stable situation for the first 20 years of the run with θ-S-ρ characteristics that remain close to the initial conditions after spin-up. A warming and saltening trend starts however in 1999. From this date, the salinity and temperature increase stepwise almost after each convective winter that is marked by a peak of abnormal characteristics related to the newly formed DW. The peaks and the stepwise increases are very marked at the LION point (black lines) and smoothed when the GoL averaged characteristics are shown (red lines). The years already identified as strong convective years (1981,1999,2005,2009,2013)  Comparisons between the model and the near-bottom observations near the LION point show a mixed behaviour: first the simulation shows cold, fresh and dense biases since the beginning of the run. This means that the initial conditions of the run after spin-up are not completely consistent with the observations. Secondly, the monthly chronology of the model does not correlate well with the observations of the two deep moorings over the common period but the modelled monthly variability range fits well for temperature and salinity. Finally, the modelled trends are consistent with the observed trends for the temperature (3 × 10 −3 • C/year) but underestimate the salinity (17 × 10 −4 /year) and the density (65 × 10 −5 kg.m −3 /year) trends. Besides, the signature of the shelf cascading (cold and dense anomaly) recorded during winter (Durrieu de Madron et al. 2013 is not reproduced by the model. It is worth noting that, although the two moorings (MOOSE-LION and HYDROCHANGES-LION) are only some kilometers apart and record at a maximum pressure difference of about 20 db, the time series exhibit sometimes notable differences, with salinity and potential density being smaller in the HYDROCHANGES mooring that is though deeper (Fig. 16). If some issues on the very fine calibration of the probes cannot be excluded, it is important to note that such a small spatial variability may be real and is also exhibited by the yearly CTD profiles of MOOSE-GE campaigns (not shown), made during summertime. Therefore it must be kept in mind, when computing deep water trends from deep sensors, that isolated observed values are representative of the water mass only at about ± 0.005 kg.m −3 at best, that is to say during summertime.
In order to give a more integrated view of these DW trends, Figure 17 shows the monthly time series of dense water volume (in m 3 ) for the NWMED area and for waters denser than 4 different thresholds (29.10, 29.11, 29.12 and 29.13 kg/m 3 ). Figure  to 1999 with a major part of the DW constituted of waters denser than 29.10 but lighter than 29.11 as expected from the literature or from gridded climatology (see the dashed line in Fig. 17 that represents the 1980-2002 average value of the dataset Rixen et al. 2005). Up to 1999, the waters denser than 29.10 kg/m 3 represent on average 1.7 × 10 14 m 3 in summer that is to say 41 % of the NWMED basin volume and reach a shallowest depth in summer between 900 and 1100 m depending on the year (not shown). From 1999, the volume of waters denser than 29.10 kg/m 3 increases stepwise mainly in 1999, 2005, 2009 and 2013 but not only. Despite this overall increase between 1999 and 2013, linear decay is possible between 2 convective years. For this density class, the simulation is in very good agreement with the gridded products over the 1980s and 1990s (Rixen et al. 2005, see Table 1  Vol>29.11 Vol>29.12

Fig. 17
Simulated monthly time series of the volume of dense water above density thresholds for the NWMED area (in m 3 ), black line for the 29.10 kg/m 3 threshold, red for 29.11 kg/m 3 , blue for 29.12 kg/m 3 and green for 29.13 kg/m 3 . Observation-based estimates are represented by full circles for MOOSE and DeWEX field campaigns (Waldman et al. 2016) and by a dashed line for the 1980-2002 average value of Rixen et al. (2005) Peaks of waters denser than 29.11 kg/m 3 appear during the convective years 1981 and 1999 but this water class disappears after some months. Signifiant volumes of waters denser than 29.11 kg/m 3 appear definitively in 2005 in agreement with Schroeder et al. (2008) and their volume increases stepwise since this date in the simulation. Those results are in agreement with the DWF rate time series (Figure 7) that shows large formation rate for year 1981 (0.52 Sv) and 1999 (0.38 Sv) when using the 29.11 kg/m 3 threshold. From 2005, the DWF rate at 29.11 kg/m 3 can even dominate the DWF rate at 29.10 kg/m 3 such as in 2005, 2009, 2012 and 2013. The largest values of DWF rate are actually obtained for this 29.11 kg/m 3 threshold with 1.7 Sv in 2013 and 1.6 Sv in 2005. Note that the model 2013 DWF rate at 29.11 kg/m 3 is also equal to 1.7 Sv when using daily model outputs instead of monthly-mean files. It is on the upper range of the observed value error bars obtained by Waldman et al. (2016) over their domain of study (1.4 ± 0.3 Sv) but underestimates their extrapolated value obtained for the North-Western Mediterranean Sea and with the optimal minimum and maximum DW volumes (2.3 ± 0.5 Sv). The emergence of this 29.11 density class is also noticed in the observed-based estimates on Fig. 17 but with a stronger increasing rate between the 1990s and August 2013. These results confirm the conclusions of

Explaining the deep water mass trends
Saltening and warming trends of the deep water masses have been derived from the observations and the model simulation. The modelled trend is significant at the 99 % confidence level from 1999 onward (see Figs. 16,17). However up to now, none of the driving factors (BL, SI, weather regime occurrence, stormy day occurrence) shows a significant trend. This probably means that the factors driving the trends in the DW characteristics are not the same as the ones driving the DWF interannual variability.
These trends are far from being linear. Indeed the trend in the surface layer heat content is not significant after 1985, the trend in the surface salt content and in the intermediate heat and salt contents are not significant after 1995 and finally the trends in the deep layer are not significant before 1999. So, the trends occur first in the surface layer, then in the intermediate layer and finally in the deep layer. It seems that everything happens as if the deep trends come from the surface and intermediate layers with a time lag. This time lag is probably due to (i) the large inertia of the deep layer (a long time is required to warm this layer) and (ii) the fact that the 1990s is a period of weak convective activity not favourable to the export of the surface and intermediate trends towards the deep layers and therefore allowing an accumulation of heat and salt in those layers (see also , for this hypothesis). For a deep export of the surface and intermediate anomalies, a very strong convective year (1999) or a series of deep convective years (2003, 2005, 2006 and then 2009-2013) is required.
We are now looking for the origin of the surface water trends: at the scale of the whole Mediterranean Sea or of smaller sub-basins, the trends in the Mediterranean Sea surface net heat flux and net water flux are not significant over the simulated period. In the run, the only forcing with a significant trend (at the 99 % level) is the surface water characteristics in the near-Atlantic buffer zone imposed by the global ocean reanalysis. Indeed, the salt content of the 0-150 m layer of the near Atlantic Ocean is significantly increasing (+5.0 × 10 −3 /year) in agreement with Millot (2007). This signal is transferred to the Mediterranean SSS which also increases by +5.0 × 10 −3 /year. Following the path of the AW, this signal of increased surface salinity influences directly the surface layer of the North-Western Mediterranean Sea and indirectly the intermediate layer of the Mediterranean Sea, the so-called LIW, through its formation process in the Eastern Mediterranean Sea. In the intermediate layer, the heat and salt contents are well correlated as convection nearly always occurs at the same density. Therefore saltier LIW means warmer LIW especially for the LIW transiting at the Sicily Strait. The surface and intermediate layers being saltier, deep convection can occur at a warmer temperature leading to consistent warming and saltening trends in the deep layer of the North-Western Mediterranean Sea.
According to model results, we conclude that the trend in the near-Atlantic surface water characteristics is the initial driver of the observed trends of deep water mass characteristics in the North-Western Mediterranean Sea. Its influence is double, directly through the AW path with a time lag of a few years and indirectly through the formation of warmer and saltier intermediate waters in the Eastern Mediterranean basin with a time lag of about 10-20 years (Robinson et al. 2001). Note that the arrival of a warmer and saltier intermediate water in the Western Mediterranean Sea could be linked with a change in the LIW characteristics during its formation and its travel, or to a varying mixing rate of the LIW with other intermediate waters originated from the Eastern basin such as the Cretan Intermediate Water (CIW).

Discussion
In this study, we aggregate long-term and quantitative indicators of the DWF for the North-Western Mediterranean Sea using historical observation data and recent long-term monitoring facilities. We are aware of the limits of this approach. The observation-based indicators indeed suffer from undersampling errors in space and time that limit their ability to capture a phenomenon as sporadic and spatially limited as DWF. The sampling errors probably lead to underestimate the true yearly maximum MLD and the true yearly maximum extension of the convective zone in the observations. We tried to avoid this limitation by computing the yearly maximum values of the MLD and of the convective area and not the MLD or the convective area at a given date as it could have been possible in the model simulation. We also acknowledge that some years are better sampled than others and that the probability to miss deep MLD is not the same for each year. To establish the maximum MLD criteria, we have therefore discarded the years having an unsufficient amount of observed vertical profiles during the winter months, and to establish the maximum extension of the convection zone, we have discarded the years having a too sparse in-situ data coverage or an unsufficient amount of usable satellite images. However this lack of data coverage could sometimes explain the large discrepencies obtained when comparing the model and the observations such as possibly in 1985in , 1988). In a future study, those errors could be estimated using Observation System Simulation Experiments (OSSE) as for example done in Waldman et al. (2016) for the DWF rate of winter 2012-2013 or in Llasses et al. (2015) for the long-term basin-scale trends.
Another questionable choice we made concerning the model evaluation is that we didn't always use the same way to compute the indicators in the model and in the observations. Indeed computations using the observations are sometimes very specific of the instruments (e.g. the criterion of the MLD definition at the LION facility varies with the depth because the sensor precision is different for the surface buoy and for the deep mooring; we use ∆T criterion in order to exploit XBT profiles before the CTD era) or of the sampling (e.g. we use a not very strict criterion to define MLD from in-situ profiles in order to recover deep MLD below shallow restratified layers). These adaptations of the criteria used to compute the observation-based indicators are relevant in the observation world, but not in the model world especially as we would like to inter-compare various models in future studies or different configurations of the same model. We then decided to compute on one side, the best indicators possible in the observation world and on the other side, the best indicators in the model world. We consequently lose some observation-model consistency but we win in terms of generalization of our study. One example of this choice is that the DW volume (Fig. 17) and the DWF rate (Fig. 7) were not computed exactly on the same domain in the model (NWMED area) and in Waldman et al. (2016). Indeed the domain chosen for the observation-based estimates (2.5 • E-9 • E, north of 40 • N and local bathymetry deeper than 2000 m) takes into account constraints related to the optimal interpolation of a 2D field and the scarcity and geography of the MOOSE-GE CTD network, and has finally a volume 20 % smaller than our NWMED area. Extrapolating the Waldman et al. (2016) estimates over the NWMED domain is not trivial and relies on the imperfect model simulation used for the OSSE. This extrapolation gives a multiplying factor of about 1.1 for the DW volumes (to be applied to the estimates of Table 1; Fig. 17) and of 1.63 for the DWF rate at 29.11 kg/m 3 , leading to a DWF rate equal to 2.3 ± 0.5 Sv instead of 1.4 ± 03 Sv for this density class (R. Waldman, pers. comm.).
To establish the relationships between the driving factors (BL, SI, stormy days occurrence, weather regime occurrence) and the DWF rate, we considered a DWF rate computed with a constant density threshold equal to 29.10 kg/m 3 . However, Figs. 3, 7, 16c and 17 show that the density of the deep and bottom water masses is changing with time and that a constant threshold is probably not adapted to study the recent years. In particular, Fig. 7 shows that four years (2005, 2009, 2012 and 2013) have a DWF rate larger using the 29.11 kg/m 3 threshold instead of 29.10. We therefore checked that using the DWF rate time series with the 29.10 constant threshold or using the DWF rate time series with an optimal density threshold (that is to say the threshold that maximizes the DWF rate) does not change the results concerning the interannual variability correlations. In particular the correlation between the optimal DWF rate time series and the BL-SI time series is equal to 0.84, very close to the 0.86 value obtained for Fig. 15c. Note that the optimal DWF rate time series shows a mean value of 0.35 Sv (instead of 0.28 Sv), an interannual standard deviation of 0.48 Sv (instead of 0.36 Sv) and no significant trend at the 95 % level.
For the current study, we consider that we used the best model configuration available for us today (see a review in Ruti et al. 2015). However we are aware that this configuration is a compromise between high-resolution in the atmosphere and in the ocean, long-term simulation, temporal homogeneity, water mass stability, observed synoptic chronology and coupling between the various components of the regional climate system. Higher spatial resolutions for the ocean model and for the atmosphere model would probably improve the way to represent the DWF phenomenon (Herrmann et al. , 2011 and its variability. Some other model weaknesses not related to resolution such as the representation of the aerosols, the biases in the air-sea fluxes or the biases in ocean circulation or water mass characteristics must also be kept in mind (Sevault et al. 2014;Nabat et al. 2015) and may have influenced the results obtained. In particular, the model has probably a too strong latent heat flux due to the use of Louis (1979) parameterization in ALADIN version 5 that is partially compensated by a too strong downward shortwave radiation flux at the basin scale due to a lack of cloud cover.
Besides, the relationships found in this study between the various DWF indicators and the driving factors (SI, BL, stormy days occurrence, . . . ) are only based on one model, one simulation and on a relatively short period (33 winters). It is therefore important to verify these results in a multi-model framework to establish their degree of robustness. The definition of the two main driving factors (SI, BL) derives from the fact that we consider, at the first order, that DWF in winter is mostly a 1D phenomenon in which the autumn stratification must be destroyed by the atmospheric buoyancy loss. This means somehow that we make the assumption that the interannual variability of the lateral buoyancy advection is negligible during the wintertime. In future studies, this assumption should be questioned by trying to find a third driving factor representing the interannual variability of the lateral buoyancy advection and in particular the contribution of the meso-and submeso-scale eddies.

Conclusion
In this study, a thorough reanalysis of past in-situ and satellite observations and a long-term hindcast simulation performed with a fully coupled Regional Climate System Model (CNRM-RCSM4) have been used in order to improve the characterization and the understanding of the past climate variability of the DWF phenomenon in the North-Western Mediterranean Sea over the period 1980-2013 (33 winters). We first establish reference long-term observation-based time series to quantitatively characterize the mean state of the phenomenon, its interannual variability and trends. This was possible only thanks to (1) a regular monitoring of the Gulf of Lions area in the past and (2) recent long-term monitoring facilities installed in the area since 2007 (HYDROCHANGES deep mooring, LION surface buoy and full-depth mooring, gliders, MOOSE observatory). Combining the various sources of observations, values have been obtained for the yearly maximum mixed layer depth (31 winters documented out of 33), the yearly maximum extension of the convective area (7 winters documented), the DWF rate (1 winter), the autumn stratification index (6 years documented), the bottom water mass characteristics evolution (monthly-mean values since 2007 and regular CTD profiles before) and the volume of dense waters (available for 2012 and 2013). These new observation-based indicators confirm results from previous studies: • strong interannual variability of the phenomenon in terms of depth reached (from shallow to bottom convection) and of spatial extension (up to 5.6 × 10 10 m 2 in 2013), • convection often reaching depths greater than 1000 m (52 % of the observed years with a value very likely underestimated due to sampling issues), • DWF rate that can be larger than 1 Sv for strong convective years, • an exceptional suite of 5 consecutive convective years from 2009 to 2013, • a period of low convective activity during the 1990s, • trend in the deep water mass characteristics (temperature, salinity, density, volume of the dense water) at least over the last 10-15 years, • strong interannual variability of the water column stratification in autumn.
We then use the observation-based indicators to evaluate the ability of the CNRM-RCSM4 model to simulate the DWF phenomenon in the North-Western Mediterranean Sea. The coupled model reproduces well the mean behaviour and the interannual variability of the phenomenon especially when keeping in mind the potential large errors related to undersampling in the observation-based indicators, the relatively coarse model resolution (approximately 10 km in the ocean and 50 km in the atmosphere), the complexity of the model (full coupling without constraint between 4 components of the climate system) and the fact that ocean observations are not assimilated in the model except for the ocean initial conditions, the Nile river and the near Atlantic Ocean.
In addition, the model simulates realistic trends in the deep water characteristics (warming, saltening, increase in the dense water volume for a given density threshold, increase in the bottom water density). Concerning the representation of the DWF phenomenon, the main model weaknesses are a missing potential convective area in the Ligurian Sea probably due to a local too strong vertical stratification, a cold and fresh bias in the bottom waters at the start of the simulation either due to the chosen initial conditions or to the spin-up, phase and a notable underestimation of the deep salinity and density trends. We assume that the good behaviour of the model allows to use the simulation to better characterize the variability of the DWF phenomenon and to improve the understanding of the main drivers of its interannual variability and of its longterm trends. Over the 33 winters studied, the model shows deep convection (yearly maximum MLD deeper than 1000 m) in nearly 2/3 of the cases, a mean yearly maximum extension of the convective surface equal to 1.1 × 10 10 m 2 and an average DWF rate equal to 0.28 Sv at the 29.10 kg/m 3 threshold (or 0.35 Sv with an optimal density threshold) with a strong interannual variability. Those values are generally in agreement with previous model studies. The list of the model strong convective years (DWF rate >0.6 Sv) includes 1981,1999,2005,2009 and 2013 but other years identified as convective in the literature are also convective in the model (e.g. 1987, 1996, 2010 and 2012). Concerning the interannual variability, the role of the interannual variability of the winter-integrated buoyancy loss (here computed from December 1st to March 31st) as the most important driver is confirmed. Alone, it explains around 50 % of the variance of the phenomenon. The interannual variability of the winterintegrated buoyancy loss is itself driven by the heat loss and is relatively homogeneous in space. In addition, it is strongly connected to daily atmospheric variability as few stormy days during the 4-month winter period explain most of the variability of winter-integrated buoyancy loss. The Atlantic Ridge weather regime has been identified as favourable to strong daily buoyancy losses whereas the positive phase of the NAO is unfavourable. Exceptional winter-integrated buoyancy loss is the main driver for some of the strongest convective years such as 1981, 2005 and 2013.
The role of the water column preconditioning before convection (here computed as a bottom stratification index on December 1st) has also been identified and quantified. This second driving factor is independent from the winter buoyancy loss and significantly influences the DWF interannual variability. In particular it allows to explain why convection is so strong in 1999 and 2009 in the model and why the convection is weaker for some years where the buoyancy loss is very high (e.g. 1987, 2012). Combining both factors allows to explain more than 70 % of the interannual variance of the DWF phenomenon. It is worth noting that the stratification index is very variable in space and time in the North-Western Mediterranean Sea and therefore difficult to estimate from observations. However the model is in agreement with the few observed estimates available for the stratification at the LION location and for the top 1000 m. The interannual variability of the stratification index is mostly driven by both the heat and salt contents of the surface layer which are not correlated between them.
Concerning the deep water mass trend, also often associated to the Western Mediterranean Transition, the model simulates the end of the steady state in 1999. From this winter, a stepwise increase in temperature, salinity and dense water volume is reproduced in the model with 1999,2005,2009,2012 and 2013 being the largest steps. In addition the model starts to simulate deep water denser than usual (denser than 29.11 kg/m 3 ) from winter 2005, this behaviour being confirmed by previous studies and by the deep observation time series. In the model, these trends are not explained by trends in the two main driving factors of the interannual variability. It is however related to trends in the characteristics of the surface and intermediate waters. Indeed AW and LIW are characterized by warming and saltening trends early in the simulation in the 1980s and the 1990s. Then, it seems that these trends induce a heat and salt accumulation during the 1990s in the surface and intermediate layers of the Gulf of Lions before being transferred towards the deep layers when very convective years occur from 1999 and afterwards in the 2000s and 2010s. Based on the model results, we conclude that the only external forcing that can explain these trends is the salinity increase in the near Atlantic Ocean surface layers. It is worth noting that in-situ observations are so far missing to state about the origin of the forcing that induces warming and saltening of the waters in the North-Western Mediterranean Sea (internal or external forcing, human-induced or natural variability). In this regard, monitoring the surface layer of the Gibraltar Strait is a proviso to improve the simulation of the evolution of the Mediterranean. More generally, we would like to underline that improving our understanding of the climate variability of the Mediterranean Sea requires long-term monitoring of the various Mediterranean water masses at key areas such as the DWF areas, using continuous surface buoys and deep moorings as well as repeated dense and deep CTD networks. The fine calibration and inter-calibration of the various sensors have also made this study possible.
Our results confirm and extend previous studies (e.g. Mertens and Schott 1998;Herrmann et al. 2010;L'Hévéder et al. 2013) using a different modelling framework, a more recent period and a more stable and more in-depth evaluated simulation. However we consider that the robustness of our results must be confirmed in a multi-model study. The relationships established here may also allow to better understand the behaviour of the DWF phenomenon in Mediterranean Sea simulations in hindcast, forecast, reanalysis or future climate change scenario modes.