Role of wind stress in driving SST biases in the Tropical Atlantic

Coupled climate models used for long-term future climate projections and seasonal or decadal predictions share a systematic and persistent warm sea surface temperature (SST) bias in the tropical Atlantic. This study attempts to better understand the physical mechanisms responsible for the development of systematic biases in the tropical Atlantic using the so-called Transpose-CMIP protocol in a multi-model context. Six global climate models have been used to perform seasonal forecasts starting both in May and February over the period 2000–2009. In all models, the growth of SST biases is rapid. Significant biases are seen in the first month of forecast and, by 6 months, the root-mean-square SST bias is 80% of the climatological bias. These control experiments show that the equatorial warm SST bias is not driven by surface heat flux biases in all models, whereas in the south-eastern Atlantic the solar heat flux could explain the setup of an initial warm bias in the first few days. A set of sensitivity experiments with prescribed wind stress confirm the leading role of wind stress biases in driving the equatorial SST bias, even if the amplitude of the SST bias is model dependent. A reduced SST bias leads to a reduced precipitation bias locally, but there is no robust remote effect on West African Monsoon rainfall. Over the south-eastern part of the basin, local wind biases tend to have an impact on the local SST bias (except in the high resolution model). However, there is also a non-local effect of equatorial wind correction in two models. This can be explained by sub-surface advection of water from the equator, which is colder when the bias in equatorial wind stress is corrected. In terms of variability, it is also shown that improving the mean state in the equatorial Atlantic leads to a beneficial intensification of the Bjerknes feedback loop. In conclusion, we show a robust effect of wind stress biases on tropical mean climate and variability in multiple climate models.


Introduction
Despite efforts made and progress achieved by the climate modelling community in the last few decades, state-of-theart coupled General Circulation Models (CGCMs) still exhibit severe errors in some regions of the world. These have persisted for several model generations. One of the most outstanding and persistent systematic model errors is the prominent warm sea surface temperature (SST) bias in the Tropical Atlantic (Richter and Xie 2008;Richter et al. 2014;Xu et al. 2014). This severe SST bias can potentially affect the simulation of the Tropical Atlantic climate, its multi-scale variability and also the reliability of climate predictions and projections in that area (Stockdale et al. 2006;Richter et al. 2018 and references herein). Improving climate models in the Tropical Atlantic is crucial to reduce the uncertainty in model simulations at all timescales.
The tropical Atlantic SST bias has received particular attention in the South-Eastern Tropical Atlantic (SETA) subregion where the causes of the SST bias have been widely documented in both single-and multi-model studies. A number of local and remote physical mechanisms have been suggested (see the reviews by Richter 2015 and;Zuidema et al. 2016). In general, these studies point to biases in the atmospheric component of CGCMs being mostly responsible for the initial development of SETA errors, which propagate to the ocean component at longer timescales (Toniazzo and Woolnough 2014;Goubanova et al. 2018); although, results from stand-alone ocean simulations reveal that systematic errors in the ocean component significantly contribute to the SETA SST biases (Xu et al. 2013;Exarchou et al. 2018).
Among the main local atmospheric causes of the SETA SST biases is the insufficient low level humidity and stratocumulus cloud cover, resulting in an excessive shortwave radiation net flux into the ocean (Giese and Carton 1994;Ma et al. 1996;Huang et al. 2007;Hu et al. 2008;Hourdin et al. 2015). Another local cause is deficient coastal upwelling and subsequent increase of SSTs, which is generally attributed to the resolution of both the atmospheric (Cambon et al. 2013;Machu et al. 2015) and ocean components (Grodsky et al. 2012) of the CGCMs being too coarse. On one hand, in coarse resolution atmospheric models, an alongshore wind stress that is generally too weak leads to an underestimated offshore Ekman transport of cold water (Milinski et al. 2016;Goubanova et al. 2018), whereas an unrealistic wind stress curl near the coast results in errors in vertical velocities and in a too strong alongshore warm southward meridional current (Small et al. 2015;Koseki et al. 2018). On the other hand, coarse resolution ocean models fail to represent the intensity and dynamics of the alongshore cold Benguela current, which results in a diffuse SST gradient over the Angola-Benguela Frontal Zone (ABFZ) associated with warm SST errors in the SETA region (Xu et al. 2013).
Remote sources for the warm SST bias have also been identified. Results from multi-model studies, mainly from the CMIP (Coupled Model Intercomparison Project, Taylor et al. 2012) database, have shown that the excessive SST warming over the SETA can also be related to the welldocumented westerly wind biases at the equator simulated by CGCMs in spring (Richter et al. 2012;Toniazzo and Woolnough 2014;Xu et al. 2014;Richter 2015). The equatorial westerly wind bias leads to an erroneous east-west tilt of the thermocline, and hence a reversal of the zonal SST gradient at the equator. A deeper thermocline in the eastern equatorial Atlantic prevents the cold tongue from developing in boreal summer (June-August) and results in a strong regional warm SST error along the equator. The role of the remote wind forcing at the equator in setting the SETA SST bias has been confirmed through coupled model sensitivity experiments in which the model wind stress is replaced by observed values (Richter et al. 2012;Wahl et al. 2011;Voldoire et al. 2014;Goubanova et al. 2018). Voldoire et al. (2014) show a significant reduction of the warm SST bias in the SETA when a correct equatorial atmospheric circulation is imposed on to the coupled model, and that the subsurface ocean dynamics play a crucial role (horizontal advection and/or Kelvin waves propagating along the equator and then along the western coast of southern Africa). However, the studies mentioned above are each based on one single coupled model, and the relative importance of the remote wind forcing in setting the warm SST bias in the SETA seems to be model dependent. Moreover, the role of the local wind forcing in modulating the remotely forced SST bias has not been generally assessed.
As part of EU-funded FP7-PREFACE project (Enhancing prediction of Tropical Atlantic climate and its impacts, http://prefa ce.wb.uib.no/), the scope of this study is to characterise, understand, and quantify the role of the wind stress forcing on the development of SST errors in the Tropical Atlantic. We perform drift analysis on a multi-model ensemble of seasonal forecasts an approach referred to as "Transpose-CMIP" (in reference to the well-known Transpose-AMIP framework; Williams et al. 2013;Ma et al. 2014). The drift analysis approach has already been used to identify physical processes responsible for the development of coupled model systematic biases both at seasonal (Huang et al. 2007;Vannière et al. 2013;Toniazzo and Woolnough 2014;Shonk et al. 2018) and decadal timescales (Hazeleger et al. 2013;Huang et al. 2014;Sanchez-Gomez et al. 2016).
The role of long term processes, involving ocean adjustments at longer timescales (e.g. ocean gyres, deep ocean circulation), is not considered in this study. In the first part of this paper, the SST bias development is characterised in seasonal forecasts (control experiments). Second, sensitivity experiments with wind stress replacement over key regions in the Tropical Atlantic following a common protocol are conducted, similar to the one implemented in Voldoire et al. (2014). The analysis focuses mainly on the mean state biases and their sensitivity to the wind stress representation, although the influence of mean state improvement on some crucial feedback mechanisms (i.e. Bjerknes feedback) associated with Tropical Atlantic variability and predictability is also investigated.
The paper is structured as follows: Sect. 2 describes the models involved in the study, the experimental setup and initialisation methods used. Section 3 focuses on the control initialised experiment and compares the drift evolution in the different models for key parameters and regions. Section 4 investigates the impact of wind stress replacement on the drift and the effect of improved mean state on the Bjerknes feedback. Results are discussed in Sect. 5.

Common protocol
We perform a set of initialised coordinated experiments of CGCMs in order to analyse model drift towards the equilibrium state and the physical processes by which the systematic errors appear (the so-called Transpose-CMIP protocol). The protocol used for model initialisation is identical to what is currently applied in seasonal and decadal forecasting. In the present case, and as discussed in the introduction, we focus on model adjustment at seasonal timescales, since previous studies (Toniazzo and Woolnough 2014;Voldoire et al. 2014) have shown that the Atlantic SST bias reaches equilibrium in a few months in the analysed models. Until recently, many operational seasonal forecasting centers used 1st February, 1st May, 1st August and 1st November as start dates. The present study focuses on winter (1st February) and spring (1st May) initialisations since the main interest is the development of summer SST bias.
As coupled model errors in the tropical Atlantic are quite systematic, there is no need for large ensemble sizes and long forecast periods. Therefore, modelling groups have performed three members for each start date (February and May), generated by perturbing the atmospheric initial conditions. Seasonal forecasts have been run for 6 months for the 10-year period 2000-2009. A common initialisation method has not been imposed; instead, a minimum requirement is that all the seasonal forecasts have been initialised at least at the ocean surface. Simple coordinated protocols have been used in order to ease the implementation of the experimental setup for all groups, including those who are not running seasonal forecasts routinely.
Six coupled models have participated in the multi-model seasonal forecast experiment, referred to as "CTRL". The main characteristics of the models are summarised in Table 1 with details on their ocean and atmosphere components, including resolution and initialisation product. The different initialisation methods will be discussed in the next section.
To evaluate the models' respective climatological errors (i.e. errors once the models have reached equilibrium), most groups have provided a long-term non-initialised simulation, called "CLIM". CLIM simulations are actually historical simulations or long-term equilibrium simulations with constant greenhouse gas (GHG) and aerosol forcing, depending on simulations already available with the same model version in each group. Differences in GHG may lead to differences in global mean SSTs, but several studies (e.g., Hourdin et al. 2015) have shown that the pattern of biases is very robust and does not depend largely on the external GHG forcing. Concerning other forcings with spatial patterns, like aerosols, the pattern remains similar in the CLIM and  Bentsen et al. (2013) and Counillon et al. (2014) 1 3 CTRL simulations, thus they only exert a minor influence on the bias pattern. To remove the effect of different global concentration of aerosols and GHGs in CLIM and CTRL, we thus consider the SST biases relative to the mean tropical SST biases, evaluated as the averaged bias between 30° S and 30° N in CLIM experiments. Based on the CTRL experiment, three idealised sensitivity experiments have been designed and performed with five of the models, in order to disentangle the role of the local wind stress errors in setting the large scale SST bias. We rerun the same seasonal experiments as CTRL, but with wind-stress replacement in three key regions of the tropical Atlantic (shaded regions in Fig. 1). In practice, the wind stress computed from the atmospheric model and passed to the ocean model component is replaced by the daily wind stress from ERA-Interim reanalysis (ERAI, Dee et al. 2011). All other coupling fluxes are left unperturbed (similarly to Wahl et al. 2011 and;Richter et al. 2012), meaning that turbulent heat fluxes are calculated using the simulated atmospheric wind.
The main characteristics of these sensitivity experiments are summarised in Table 2. In the first, named TAU30, ERAI wind stress is replaced over the whole tropical Atlantic basin between 30°S and 30°N. This experiment allows us to assess the role of basin-wide wind stress errors. In the second and third experiments the replacement is performed over much more limited regions: the equatorial band between 5°S and 5°N (experiment TAUEQ), where large-scale oceanic waves such as Kelvin waves are excited, and the coastal upwelling region from 10°S to 30°S and from the west coast of southern Africa to the Greenwich meridian (experiment TAUBE), where the wind drives local upwelling and the coastal current system (Small et al. 2015).

Model description and specificities according to the common protocol
Atmospheric initial conditions are issued from the ERAI reanalysis for all models except IPSL, which is initialised from its own atmospheric equilibrium state. For the ocean, four models start from ORA-S4 ocean reanalysis (Balmaseda et al. 2013), while the Cerfacs and NorCPM models are initialised from other products. The former models are based on the NEMO ocean model at ORCA1 grid resolution, on which ORA-S4 is also based. The Cerfacs model features the highest horizontal resolution, and runs on an ORCA025 grid for the ocean component. Hence, to avoid problems derived from interpolation from the coarser ORA-S4 resolution to a finer resolution, the initial conditions were derived from Glorys2v3 ocean reanalysis (Ferry et al. 2012), which has been produced with NEMO also on ORCA025 grid. To assess the role of using a different initialisation product, CNRM has performed two control experiments: one starting from ORA-S4 and one starting from an interpolated version of Glorys2v3 to the ORCA1 grid. It has been assessed that conclusions of this study do not depend on the initialisation product used. In NorCPM, which is based on the NorESM model (Bentsen et al. 2013), the initialisation employs an ensemble Kalman filter (EnKF) data assimilation technique as in Counillon et al. (2014Counillon et al. ( , 2016, where the SST anomalies are assimilated. In this case, only SSTs are initialized as opposed to the ocean 3D temperature and salinity field in other five models. The assimilation is carried out once per  The CNRM and Cerfacs model are relatively similar. They mainly differ by the resolution, the Cerfacs model being the high-resolution version of the CNRM model (although the two models have the same atmospheric vertical resolution). Other minor differences do not affect the outcome of this study (as shown in Goubanova et al. 2018): a slightly different ocean model version (but with a similar choice of physical parameterisations); the use of a different sea-ice model (GELATO for the CNRM model and LIM2 for the Cerfacs model); and lastly, the land surface scheme has been externalized in the CNRM version, but the model physics remains identical to that in Cerfacs. As suggested by Goubanova et al. (2018), given the region and timescales of interest, the differences in land surface and sea-ice implementation are unlikely to explain the differences in the SST drift between these two models.
In the ECMWF model (System 4), we only analyse seasonal forecasts (CTRL experiment). Simulations with wind stress replacement and long-term climatological simulation were not available.

Basin-wide evolution of biases over the Tropical Atlantic
A warm SST bias first appears at the African coast off Mauritania and in the Angola-Benguela region, which progressively extends over the equator in all models (Fig. 2). Nor-CPM presents a strong cold bias in other areas: given the removal of the mean tropical bias in these plots, this mean that the model develops very intense errors in the Atlantic.
In the ECMWF and IPSL models, the bias is close to the equator but does not span it. The rate at which the SST bias develops is model dependent: in ECMWF and EC-Earth, the bias is still limited to the coastal region in month 3, whereas in CNRM, Cerfacs and IPSL, it has already spread to the equator in month 2. However, for all models, the pattern and intensity of the SST bias averaged over lead time 4-6 months in the CTRL experiments ( Fig. 2, fourth column) are close to the long-term SST bias of the CLIM experiment ( Fig. 2, last column). Note that as already shown in many studies (e.g. Richter et al. 2014), the climatological SST biases in the tropical Atlantic are relatively similar amongst the models (Fig. 2, last column). The spatial correlation (Table 3) between the CTRL and CLIM SST bias patterns at months 4-6 after initialisation is greater than 0.75 for forecasts initialised in May and greater than 0.9 for forecasts initialised in February. The lower correlation found for May in EC-Earth, CNRM and Cerfacs is largely explained by a different pattern of SST bias in the tropical north Atlantic but the equatorial and southern pattern are also very similar (not shown).
The evolution of the CTRL RMSE of monthly mean SSTs averaged over the TAU30 area shown in Fig. 1 (Fig. 3a, b; red line) shows a progressive increase of the SST bias. Note that the CTRL bias is already quite large in the first month but it represents only one-third of the long-term RMSE (blue line). This suggests that the bias develops very fast, thus there is a need to look at higher frequency data to identify its origins (next section). At a 6-month lead time, CTRL RMSE has reached nearly 80% of its climatological value. Given that the spatial correlation shown in Table 3 is also very large this clearly confirms that the long-term SST bias settles in the first months (as suggested by Toniazzo and Woolnough 2014). This general picture is valid for all models (Figs. 2, 3). The results discussed here for the whole area of Tropical Atlantic also apply for the separate regions of intensified errors (ATL3 and OSE, discussed in detail below).
To overcome the limited ensemble size and starting dates (3 members and 10 starting dates) used in this study, we investigate the robustness of the spatial pattern of SST bias identified in Fig. 2 for each model. Dots on Fig. 2 indicate grid-points were the sign of the SST bias is consistent in all members for all years. For all models, there are dots for nearly all points where the bias exceeds 0.5 K. This clearly confirms that the ensemble size is large enough to robustly capture the SST warm bias.
Contrary to the SSTs, the zonal wind stress and precipitation RMSE ( Fig. 3c-f) are much closer to the CLIM RMSE after only 1 month for May initialised forecasts. The picture is somewhat different for February initialised forecasts where the first month CTRL RMSE is significantly smaller than CLIM RMSE.
The spatial pattern of wind stress errors is more modeldependent than that of SST (Fig. 4, 1st and 2nd column). All models tend to have a westerly zonal wind bias in early summer (weaker than observed zonal winds) over the western equator but with very different intensities and large-scale patterns. The northern hemisphere trade winds tend to be weaker than observed in all models except IPSL. The shape and amplitude of the spatial pattern of meridional wind SST error (K) with respect to the HadISST data at lead-time of 1 month (first column), 2 months (second column), 3 months (third column) and averaged over lead-time 4-6 months (fourth column), for a long-term non initialised simulation averaged over May, June and July (fifth column). Each row shows a different model: ECMWF (first row), EC-Earth (second row), CNRM (third row), Cerfacs (fourth row) and NorCPM (fifth row) for February starts. Dots indicate grid-points where all members and years agree on the bias sign for that particular lead-time stress error is also model-dependent. There is a tendency to overestimate the northward wind stress in the southern hemisphere, but the exact location of this overestimation depends on the model. For ECMWF, EC-Earth, CNRM and Cerfacs, there is a clear northward positive wind stress bias north of Cape Frio (17°S). On the contrary, in IPSL and NorCPM, there is a southward bias (weaker than observed) in northward wind stress. However, it has to be noted that in ERA-Interim the northward meridional wind along the coast tends to be overestimated compared to Quickscat (Fig. 13 in Goubanova et al. 2018), thus this southward bias may depend on the reference chosen. The precipitation biases are much more consistent than the wind stress bias among the models. There is a clear overestimation of simulated precipitation over the equatorial ocean and a tendency to underestimate precipitation over surrounding continents (except for the CNRM model over north-western Africa). From the analysis of the model bias, the link between wind stress and SST biases is not Fig. 4 Zonal wind stress (left), meridional wind stress (middle) and precipitation (right) in the control experiment (contours, every 0.02 N m −2 s −1 for wind stress components and every mm day −1 for precipitation) and its error with respect to the ERA-Interim data for wind stress and to GPCP for precipitation (shaded), averaged over lead-time 4-6 months (May, June and July) for February starts 1 3 straightforward. The sensitivity experiments presented in Sect. 4 will give more insight to this question.
For the simulated variables considered in this section, the tropical Atlantic biases settle very robustly in the first few months after initialisation. This validates the utility of the Transpose-CMIP method for understanding the coupled climate model biases in the region. The fast development of SST biases shown here justifies the drift analysis at higher frequency which will be performed in the next section.

Analysis of daily data over key regions in the tropical Atlantic
In multi-model approaches, the analysis of daily data over a large domain presents a challenge given the amount of data to process. To ease this analysis, daily data have been spatially averaged by each modelling center over two boxes (black boxes on Fig. 1): the ATL3 box (3°S-3°N, 20°W-0°E) to assess the equatorial region bias and a large box over the south-eastern open Atlantic Ocean (OSE, 20°S-5°S, 0°E-10°E) to assess the large SST biases. Figure 5 shows the daily evolution of SST biases for February and May forecasts for both boxes. It is clear that the time scale of bias development greatly depends on region and on start date. Over OSE, the drift is rather constant in all models and both start dates. This may suggest that the processes explaining the drift relate more to a progressive integration of biases. In contrast, over ATL3, the error in CLIM simulations has generally a large seasonal cycle (except in IPSL) and the initialised forecasts tend to follow this seasonality. Forecasts initialised in February generally show a weak initial bias (less than 1 °C), until June when it grows quickly over the next 2 months for Cerfacs, CNRM, EC-Earth and NorCPM. This suggests that, in these models, the processes controlling the bias adjustment are different in the two boxes. For three of these models (CNRM, NorCPM and Cerfacs), the biases evolve also in June and July for May starts, indicating that the drift evolution depends on the annual cycle and less on pre-existing biases. Specifically, it suggests that the warm SST bias can be regarded as a weak or insufficient cold tongue development. The picture is different for the ECMWF and IPSL models for both start dates and also for EC-Earth for May initialised forecasts. In ECMWF and IPSL models, the equatorial SST bias is weak compared to the other models (Fig. 2, fourth column) and there is no clear change in the drift speed during the simulation. In EC-Earth the cold tongue development is better captured in the forecasts initialised in May, while it is too weak in those initialized in February. The May initialization results in an improved representation of the ATL3 cold tongue in this model.
As the models are initialised from estimates of observed SSTs, the evolution of the biases in surface heat flux can help understanding the role of fluxes in setting the SST bias. For the surface heat fluxes, the uncertainty in derived observations is quite large as discussed in Zuidema et al. (2016); for this reason, we compare the model fluxes with several products to quantify this uncertainty. The evolution of heat fluxes for ATL3 and OSE regions are shown on Figs. 6 and 7 respectively. We consider the full fields rather than anomalies and apply a 9-day smoothing to the time-series to remove the large day-to-day variability. The sensible heat flux is not shown since it is very much weaker than other fluxes over the considered regions.
Over ATL3 (Fig. 6), the simulated surface net heat flux is underestimated from the beginning of the simulations for February initialised forecasts except the ECMWF model. For models with a warm bias (IPSL, Cerfacs and CNRM), this suggests that surface heat flux cannot explain the initiation of the warm SST bias and is rather a consequence of the SST biases. This is corroborated by the fact that underestimation of surface heat fluxes is correlated to the SST bias. In NorCPM, the initial cold bias could be well explained by the underestimated net heat flux associated with a latent heat flux underestimation. In EC-Earth, SST bias is very weak until May. In the meantime, the net heat flux is underestimated after only few days due to weaker than observed solar radiation.
However, it is necessary to look at all the surface heat flux components individually, since the solar heat flux could warm the mixed layer and the other fluxes could overcompensate such an excess of heat. The surface solar flux is underestimated in all models from the start, except in IPSL and ECMWF. In these two models, the SST bias remains very weak until May, therefore the weak excess in surface solar radiation cannot be directly related to the SST biases in these models. In NorCPM, the surface solar flux is underestimated in the first month, but is within the range of observations for the next 2 months. This could explain the initial cold drift and the warming in March and April. Then from May to August the surface solar heat flux is severely underestimated and this cannot explain the warm drift intensification. Similarly, in all other models the surface solar heat flux is underestimated from the start and cannot explain the initial warm SST drift. The surface latent heat flux is within the range of observations or overestimated in all models from the start. This means that there is an excess energy loss by the ocean and this cannot explain the warm SST bias, either; it rather responds to the warm bias and ameliorates it. The simulated net longwave heat flux is within the range of observed estimates and is a much weaker flux (note that the plot scales are different). In the Cerfacs and CNRM models, the longwave flux is less (in absolute value) than in other models and probably underestimated at least from June. The underestimation is of the order of ~ 10 W m −2 whereas the net heat flux bias is 3 to 4 times greater. The analysis of May starts yields similar results (not shown).
In conclusion, over ATL3, the surface heat fluxes cannot explain the initial warm SST drift in any of the models considered here.
Over OSE, the amplitude of the annual cycle in net surface heat flux is much larger than over ATL3, with a net ocean warming by surface heat flux in the boreal winter and a net ocean cooling in the boreal summer (Fig. 7). The transition from warming to cooling is relatively realistic in all models until June, when models do not agree on the net heat flux bias. Then, they tend to underestimate the net surface heat flux. The latent heat flux is relatively close to estimates in the first few days and starts to increase (in absolute value), suggesting a larger surface heat loss from ocean to atmosphere than in observations, thus exerting a compensating (cooling) effect to the SST warming. The same holds for the net long-wave heat flux. On the contrary, the solar flux is overestimated in the first few days in three models (CNRM, Cerfacs and EC-Earth); an excess in solar heat flux that could generate an initial SST warming. This is not true for NorCPM but is consistent with the initial cold bias of this model over OSE. Overall, this suggests that local feedbacks occur in the region on very short time-scales as shown in Toniazzo and Woolnough (2014) for the CFS model.

Sub-surface temperature drift
The surface flux analysis suggests that surface heat fluxes are not the only drivers of the SST warm biases. The biases develop in regions of upwelling, where the characteristics of subsurface waters drive the ocean surface biases (Wahl et al. 2011). For this reason, investigating the bias evolution of the temperature at the ocean subsurface can be indicative of the processes driving the setting of SST warm errors. Figure 8 shows the evolution of the ocean temperature biases on a vertical section from surface to a depth of 200 m along the equator and along the southern African coast (as in Goubanova et al. 2018). This reveals commonalities and differences in the way that the bias develops. In most of the models, the temperature bias in the Angola-Benguela Frontal Zone (ABFZ) around 15°S starts to develop in February   Fig. 8 Temperature error (K) in CTRL with respect to the ORAS4 data for a lead-time of 1 month (first column), 2 months (second column), 3 months (third column) and averaged over lead-time 4-6 months (fourth column), for the EC-Earth model (first row), the CNRM model (second row), the Cerfacs model (third row), the IPSL model (fourth row) and NorCPM (fifth row) for February starts. Plots show cross-sections along the equator as a function of depth and degrees longitude (the left panel of each pair), and along the southwestern coast of Africa as a function of depth and degree latitude (the right panel of each pair). Data along the equator is averaged within 2°S-2°N and data along the coast is averaged within 2° off the coast. The solid (resp. dashed) black line is the depth of the 20 °C isotherm in ORAS4 (resp. in the model considered) at the surface, (and to a greater extent in CNRM and IPSL). Then the surface bias strengthens in April and May. For NorCPM, whose behavior is quite different to the rest of the models, it is hard to conclude on this point given that the temperature bias is already quite large in the first month, probably due to the specific initialisation method as discussed in Sect. 2.2.
At the equator (consistently with Fig. 2), the SST bias is negligible during the first two months in all models. Nevertheless, a positive temperature bias (1-2 °C) has already developed at the thermocline depth for all models in February. For EC-Earth, IPSL and CNRM, this subsurface warming is located to the west, whereas Cerfacs and NorCPM exhibit a larger bias extending along the equatorial thermocline. The differences in the thermocline biases between CNRM and Cerfacs are likely due to the different ocean reanalysis used for initialisation. Indeed, the CNRM CTRL experiment initialised from the Glorys2v3 product as in Cerfacs shows that the temperature bias in the first month is very similar to that in the Cerfacs CTRL (not shown). For the CNRM and Cerfacs models, there are differences in biases during the first month that are mainly due to the difference in initialisation product; afterwards, the behavior is relatively similar, validating the approach used in this study. The IPSL model, which lacks a warm SST bias on the equator (Fig. 2), shows a subsurface bias consisting of an initial cooling in the eastern part of the equator.
For most of the models, a strong warm bias ranging from 4 °C to more than 8 °C, together with a thermocline deepening, develops on the eastern equator 4 months after initialisation (May, June and July). For CNRM and Cerfacs this positive temperature bias becomes particularly large, exceeding more than 6 °C already by April. Figure 8 also shows that the equatorial thermocline and associated temperature biases that develop in the ocean subsurface in early spring seem to propagate eastwards along the equator, reaching the continent and moving southwards along the coast. The equatorial bias then connects with the ABFZ warm bias. This bias development and propagation process is more prominent in CNRM and Cerfacs models, and less important though also present in EC-Earth, IPSL and NorESM. There are also important differences concerning the timing of the formation and propagation of the subsurface temperature biases. In particular, for CNRM and Cerfacs this mechanism occurs faster than in the other models.
In summary, based on the drift analysis performed so far, the spring and early summer SST biases over the SETA are generated locally during the first month after the initialisation with a sudden surface warming, associated with excess solar radiation. At longer timescale (1-6 months, depending on the model considered) this local warming connects with the equatorial bias. At the equator, there is a deepening of the thermocline associated with a subsurface temperature warming initiated at the equator, which moves eastwards and then propagates along the African coast, ultimately merging with the ABFZ SST bias.

Sensitivity experiments analysis
The role of momentum surface flux in driving the thermocline, subsurface temperature biases and subsequently SST biases has been proposed in many studies Ding et al. 2015;Small et al. 2015;Voldoire et al. 2014;Koseki et al. 2018;Wen et al. 2017). However these studies are all based on a single model analysis using different protocols. The latter makes difficult to compare the amplitude of the wind stress impact. Moreover the relative role of equatorial and SETA wind biases in driving SST biases in the tropical Atlantic are not well established. For instance, in Voldoire et al. (2014) the equatorial wind has an impact on the SETA region SSTs, but such an effect has not been documented in other models. The novelty of this work is the realisation and analysis of common sensitivity experiments of wind stress replacement that help to better disentangle the role of wind stress biases in several regions and different models, as already discussed in Sect. 2.2 (see also Fig. 1; Table 2). In the following section, we analyse the impact of wind stress replacement on TAU30 (whole tropical Atlantic), TAUEQ (equatorial wind stress replacement) and TAUBE (replacement over the SETA region).

Impact of wind stress corrections on SST
The first column of Fig. 9 shows the impact of TAU30 experiment for the forecasts initialised in February (the impact is calculated as the TAU30-CTRL difference). For TAU30, there is a reduction of SSTs along the equator and over SETA in all models, though the intensity of the TAU30 cooling is model dependent and seems to be proportional to the magnitude of the original bias. It is around 1 °C in IPSL, NorCPM and EC-Earth (Table 4), whereas it reaches nearly 2 °C in CNRM and Cerfacs in ATL3. In all models, except EC-Earth, TAU30 results in a strong cooling in the Benguela region (32°S-20°S, 12°E-18°E) from 1.6 °C in the IPSL model to 3.4 °C in the CNRM model.
The respective impacts of equatorial wind and coastal wind SST biases can be inferred by comparing the results from TAUEQ and TAUBE (Fig. 9, first and last column). For most of the models except Cerfacs, the wind correction over the SETA (TAUBE experiment) induces a local surface cooling near the coast. For the Cerfacs model, the wind bias is less intense from February to April (not shown) thus the impact of correction is less than in the CNRM model. In particular, Goubanova et al. (2018) have shown that, due to the enhanced resolution, the model is able to better represent Fig. 9 The impact of replacing the wind stress on SST (K) averaged over May, June and July in the sensitivity experiments TAU30 (left), TAUEQ (middle) and TAUBE (right) compared to the CTRL experiment for February starts (lead time 4-6 months) the local fine-scale atmospheric wind and ocean processes associated with coastal upwelling than its low resolution counterpart (CNRM). The SST response in TAUEQ ( Fig. 9 middle column) is, in contrast, more widespread. In all models, the local impact of the wind stress replacement over the equator is of similar magnitude to the TAU30 experiment. This clearly demonstrates the main role played by the errors in equatorial wind stress on equatorial SST biases. Though the local impact is obvious, the remote effects are more model dependent. In particular, for IPSL and NorCPM, the equatorial wind stress correction does not heavily impact the SETA SST bias (Table 4). On the contrary, in CNRM and Cerfacs models, and to a lesser extent in EC-Earth, the equatorial wind stress correction also impacts the SETA SST by cooling the ocean surface by around 4 °C. The effect of equatorial and SETA wind correction are relatively linear: if we add TAUEQ and TAUBE impacts, it matches the TAU30 impacts (Table 4, last column) relatively well.

Impact of wind stress corrections on subsurface temperature
We now investigate the effects of the equatorial wind replacement on the ocean subsurface. Consistently, in all models, the subsurface temperature biases appear later on the equator in TAUEQ (Fig. 10) than in the CTRL experiment, and are reduced drastically at a six-month lead time. The bias in thermocline depth in the eastern equatorial region is largely reduced in EC-Earth, CNRM, Cerfacs and IPSL. The effect of TAUEQ is weaker in NorCPM but the SST cooling in the model is also less widespread in May, June and July over the equator. It is clear that the SST bias is reduced from the equator to 15°S along the African coast ( Fig. 9) for all the models. However, south of 15°S, there is no impact of the equatorial wind stress correction in NorCPM, IPSL and EC-Earth. In contrast, in the CNRM and Cerfacs models, the impact of equatorial wind correction reaches the ABFZ region. Through a mixed layer heat budget analysis, Goubanova et al. (2018) shows that equatorial warm biases in CTRL are advected at the subsurface to the ABFZ region in those models. In TAUEQ, by reducing the warm bias over the equator, the water advected to the ABFZ region is colder and thus subsurface warm advection is reduced in TAUEQ compared to TAU30 in CNRM and Cerfacs models. The local upwelling therefore brings relatively colder water to the surface than in TAUBE experiment. Goubanova et al. (2018) also suggest that this overestimated warm advection is linked to exacerbated intraseasonal coastal Kelvin waves. This is consistent with Benestad et al. (2002) who show that, with a deeper thermocline, the intra-seasonal Kelvin waves are characterised by higher amplitude and propagate faster. Even if the mechanism of advection of warm water from the equator is less obvious, it is also present in EC-Earth and IPSL models, since the ocean temperature biases are also significantly reduced in TAUEQ. Note however that the fast SST bias appearing over the ABFZ during the first month in all the models is consistently unaffected in TAUEQ experiment. This indicates that local bias mechanisms, such excessive solar radiation (shown on Fig. 7) and local wind stress errors, are in play.

Impact of wind stress corrections on precipitation
We now analyse the effect of wind stress replacement on precipitation (Fig. 11). Experiment TAU30 introduces a similar impact on precipitation across all models (Fig. 11, first column). Rainfall is reduced by 3-8 mm day −1 over the equator and enhanced along the northern and southern edges of the ITCZ. Although the exact location of the region of increase is model dependent, it is often located on the western side of the basin, near the Brazilian coast. This means that SST reduction partly corrects the initial precipitation bias. There are few impacts on precipitation over surrounding continental areas; an exception is an impact on the Sahel region in the Cerfacs model, which is not significant. Impact on precipitation in TAUEQ is very similar to that in TAU30 in all models, reinforcing the idea that this is a local effect of SST changes. Not surprisingly, there is no effect on precipitation in TAUBE since we consider the late spring which is a dry season in the Benguela region. This is also the case in the experiments starting in May for the August-September-October average (not shown).
As the wind stress replacement has been applied to the ocean model only, the effect on precipitation is a consequence of changes in SST and surface current, instead of a direct consequence of the wind replacement. The change in equatorial SST has a clear local impact on precipitation, but there is no clear impact on the West African monsoon, except along the African coast. Several studies (Mitchell and Wallace 1992;Brandt et al. 2011;Caniaux et al. 2011;Giannini et al. 2003) have stressed the impact of the Atlantic cold tongue on the West African monsoon, but this is not noticeable in our analysis. This could be explained by the intermittent nature of the link between SST variability and Sahelian precipitation. Joly and Voldoire (2010) found that such a link was evident in observations before the 1970s, but not afterwards. Rodriguez-Fonseca et al. (2011) state that the link is modulated by the Atlantic multidecadal oscillation. In combination, these studies suggest that the link is not active during the period used in our study. Fig. 11 The impact of replacing the wind stress on precipitation (mm day −1 ) averaged over May, June and July in the sensitivity experiments TAU30 (left), TAUEQ (middle) and TAUBE (right) compared to the CTRL experiment for February starts (lead time 4-6 months) We may also wonder if the improved SST leads to a change in wind stress simulated by the atmospheric model. We have found a very limited impact on the simulated wind stress (not shown). This is consistent with the Richter and Xie (2008) study in which they show that in CMIP3 models atmospheric wind biases are similar in coupled simulations and in SST forced atmospheric simulations. SST biases and coupling feedbacks only amplify the atmospheric originated wind biases.

Impact on key regions and time-scales
Looking at the time-scale of the wind replacement impact (Fig. 12), the similarity between TAU30 and TAUEQ in reducing the bias is also obvious. The impact of wind replacement on the ATL3 region is weak in early spring when the bias is weak and becomes large when the errors grow and this corresponds to the season when the Atlantic cold tongue develops. This clearly suggests that the Atlantic cold tongue is more intense when the wind stress is corrected for the ocean model. The IPSL model appears to behave differently since the bias is more intense in early spring before the cold tongue season and decreases afterwards. However, the wind replacement clearly reduces the bias. For the OSE region, the effect of wind correction is weak in all models except in the Cerfacs and CNRM models. In the Cerfacs model, the effect of equatorial wind correction alone explains most of the impact on OSE whereas in CNRM, the equatorial wind correction only partly explains the OSE impact (as can be seen in Fig. 9).
Over ATL3, the impact of the SST bias reduction on the net surface fluxes follows the SST change (Fig. 13). There is a clear increase in net surface downward heat flux in ATL3 in all models, implying a negative feedback that counteracts the initial SST cooling. This change in net surface heat flux is driven by the development of a positive latent heat flux anomaly (a negative downward flux, corresponding to a reduction in absolute flux) and by a parallel increase in downward solar heat flux. These increases are partly balanced by a reduction of the outgoing surface longwave flux resulting from a change in temperature. By decreasing the SST, the latent heat flux is thus weakened, and the solar heat flux is increased. As the surface albedo is not changed in these experiments, the change in solar heat flux over the ocean mainly results from a reduction in cloud cover. This change is consistent with a reduction of convection that probably explains the reduced precipitation (Fig. 11). Over ATL3, the impact on surface fluxes is consistent in all models, indicating its robustness. February starts. When SST error is reduced in the sensitivity experiment the difference between the two curves is filled 1 3

Impact on the Bjerknes feedback
The effect of wind replacement on the equatorial Atlantic is rather similar in all models even if the amplitude of the impact is model dependent (and depends on its original bias amplitude). We next investigate whether increased cold tongue development leads to a better representation of the Bjerknes feedback. This feedback process is considered to be one of the main drivers of interannual variability in the equatorial Atlantic (e.g. Keenlyside and Latif 2007), and is a positive ocean-atmosphere feedback that links wind stress, ocean heat content and SSTs. It can be viewed as a feedback loop based on three components: first, a cold SST in the east, increases the zonal surface temperature gradient and increases the easterly wind stress τ u in the western equatorial Atlantic (component 1: λ SST→τu ). Such a wind stress increase enhances the surface currents and drives the warm surface water to the west, thus decreasing the mixed layer depth H in the east (component 2: λ τu→ H ). In turn, this shallower mixed layer favours the local cooling by reducing the mixed layer heat content and also favours vertical shear in the currents and thus the mixed layer cooling processes (component 3: λ H→SST ). Deppenmeier et al. (2016) have proposed a method to diagnose this feedback in reanalysis and CMIP5 models. Following this approach, the three components of this feedback have been calculated in the Transpose-CMIP experiments. As we have not diagnosed the mixed layer heat content in all models, it is approximated using the mixed layer depth. Details on the method are given in Appendix 1.
To derive mean correlation of the feedback, we have averaged the 10% of the grid points with the highest correlation. This allowed us to account for model deficiencies in representing the exact location of the impact as well as its change following the seasonal cycle. Correlations were also averaged over all start dates, years, members and lead time so as to increase the robustness of the diagnostic. The exact methodology used to calculate the Bjerknes components is also described in Appendix 1.
As wind stress has been replaced in the ocean model, the link between SST and the ocean wind stress (component 1) is broken by construction and cannot be studied here. The second component (λ τu→ H ) is increased in four of the models, meaning that wind stress anomalies in the west are better correlated with mixed layer depths in the eastern basin (Fig. 14a). The main effect of the wind replacement is on the third component (λ H→SST ) which is clearly increased for all models in both TAU30 and TAUEQ (Fig. 14b). This means that mixed layer depth anomalies in the east are better correlated to SST anomalies locally in the wind stress replacement experiment. This suggests that there is an indirect  Dippe et al. (2018). Wen et al. (2017) have shown that the depth of the 20 °C isotherm is very sensitive to the surface wind forcing in ocean forced experiments. For most of the models in our experiments, the effect of wind stress replacement in TAU30 is a shallowing of the mixed layer in the eastern basin, with the exception of IPSL (Fig. 15). However, there is a more consistent effect on the zonal gradient of the mixed layer: in TAU30, the gradients in all models are of the correct sign in the range 20°W to 0° (Table 5). This was not the case for CNRM, Cerfacs and IPSL in the CTRL experiments. It would have been interesting to investigate surface currents, surface equatorial undercurrent and vertical shear to assess if the improved Bjerknes feedback is closely related to an improvement of the mean state. Unfortunately, these diagnostics are not available in all models. Nevertheless, this study is consistent with the results of Wen et al. (2017) in ocean forced models. The depth of the mixed layer is also sensitive to the wind forcing in atmosphere-ocean coupled models.
The Bjerknes feedback analysis highlights the relevance of western equatorial Atlantic wind stress in explaining SST variability. Figure 16 relates the SST change between CTRL and TAU30 in ATL3 in May-June-July to the amplitude of wind stress correction in the western Atlantic in May-June. Stars indicate the multi-year multi-member mean and dots the individual members for each year. There is a clear correlation between the intensity of the wind correction and the SST impact both on the mean climate considering the multi-model ensemble and on the internannual variability. The strength of interannual variability is also relatively similar in all models.

Summary
In this study we use a multi-model framework to investigate the processes involved in setting the SST bias in the Tropical Atlantic when starting from an observed state. The development of the bias as it grows from an initialized state is quantified using 6-month-long seasonal predictions, initialized from observations in February and May for the period of 2000-2009. This is the so-called "Transpose-CMIP" approach, where coupled atmosphere-ocean climate models are run in weather-forecast mode. The approach allows a detailed evaluation of the processes that are responsible for the development of the biases in terms of the time scales at which different biases appear. The fully developed climatological bias is quantified using long-term simulations, which show that the tropical Atlantic SST bias develops within a few months. Wind stress, precipitation and surface heat flux biases tend to reach their climatological biases similarly quickly (within the first month for May initialisation and 2-3 months for February initialisation). The patterns of SST biases, characterized by an eastern equatorial warming which extends to the south-eastern basin, are very similar among all models. Precipitation biases, with a wet bias over equator and generally dry bias over the surrounding continents, are also very similar among the models. In all models, the equatorial SST bias mainly emerges in spring and is associated with an anomalous thermocline structure along the equator, hence it can be interpreted as a weak cold tongue development. Over the south-eastern open ocean (OSE) region, the SST bias develops more progressively in all models pointing to an accumulation of an excess of energy. Accordingly, models tend to simulate an excess in solar heat fluxes in the first few days of the forecasts, after which they decrease below observations. The excessive solar fluxes can generate an initial SST warming, which could be further sustained through local feedbacks with the ocean. Wind stress errors are model dependent. There is no straightforward link between wind stress and SST errors that can be inferred from the control experiments.
In order to quantify the role of the wind stress on setting the SST bias, a set of sensitivity experiments has been performed, in which the model wind stress is replaced with ERA-Interim reanalysis wind stress. This wind stress correction is applied to three key regions: the whole Tropical Atlantic (TAU30), the equator (TAUEQ), and the coastal Angola-Benguela upwelling regions (TAUBE). The comparison among the three cases allows for identification of local versus remote effects of wind stress on the generation of the SST bias. The main conclusions of the sensitivity experiments are as follows: • Correcting the wind stress in the Tropical Atlantic in TAU30 leads to a reduction in the SST bias over the equator and to a better simulated cold tongue development. The improvement is proportional to the magnitude of the westerly zonal wind stress bias on the equator. The thermocline is much better represented in TAU30 confirming the leading role of wind in controlling mixed layer depth along the equator, as shown by Wen et al. (2017). Local erroneous zonal wind stress is a major source of bias over the equator in all models considered since TAUEQ impacts are similar to TAU30 impacts on the equator. • The errors over OSE are more model dependent. They are due to the weaker meridional wind stress along the coast in NorCPM and IPSL; to advected errors from the equator in Cerfacs; and to both local and remote wind stress biases in CNRM. In EC-Earth wind stress correction locally or at the equator has very limited impact on the coastal biases, pointing to other sources of biases, such as surface heat fluxes, ocean stratification and dynamics. • The colder SSTs over the equator in the wind corrected experiments lead to less convection, less cloud cover and less precipitation, thus reducing precipitation biases. However, dry bias over the continents are not robustly affected; in particular, there is no clear impact of improved SSTs on the West African monsoon, in contrast to suggestions from earlier studies (Mitchell and Wallace 1992;Brandt et al. 2011;Giannini et al. 2003), except in the Cerfacs model. • Consistently with Dippe et al. (2018), the improved equatorial mean state leads to a clear intensification of the Bjerknes feedback, in particular, the effect of mixed layer depth anomalies on SSTs is enhanced in all models.

Discussion
Our multi-model study has allowed us to highlight similarities and differences in the processes driving the SST bias in global climate models. However, the results may depend on the sensitivity experiment setup used. When we replace the wind stress directly in the ocean model, the surface turbulent heat fluxes are calculated using the modelled wind stress coming from the atmospheric component, which may lead to an inconsistency between momentum and turbulent heat fluxes. Other options could be more adequate. For instance, nudging the atmospheric wind towards reanalysis data would overcome this problem of momentum flux inconsistency (Voldoire et al. 2014), but the drawback of this method is that the atmospheric nudging also perturbs the atmospheric flow. Thus the impact of SST changes on atmospheric flow are less reliable. Given that atmospheric nudging was not easily implementable in some of the models involved in the project, the first approach has been retained. Another drawback of the experimental protocol is the choice of the ERA-Interim reanalysis as the reference data set. For instance, QuickSCAT (Mears et al. 1999) satellite measurements are known to provide more reliable estimates of the wind stress along the coast. On Fig. 4, the sign of the meridional bias in the south-eastern Atlantic along the African coast depends on the model. However, it has previously been shown in Goubanova et al. (2018, Fig. 13 herein) that ERA-Interim largely overestimates the wind stress in this region, thus the underestimation found for the IPSL and NorCPM models may not be robust. Using QuickSCAT in sensitivity experiments would have been a better choice, but this data set does not provide full coverage at a daily time scale as required for our experimental protocol. A hybrid product might have been the best choice, but such a product was not available and well-validated when the protocol was set up and implemented.
Separate sensitivity experiments would be needed to disentangle the role of the solar heat fluxes and the associated feedbacks in the ocean mixed layer in the south-east region. The Transpose-CMIP protocol is shown to be an appropriate framework to perform such experiments. A more precise evaluation of the models in the SETA coastal region would be worthwhile, although this would require improvement to the initialisation method so as to start from an unbiased state. Such an analysis also requires focussing on the coastal region using tools adapted to each model grid, and such an analysis is not straightforward in a multi-model context. This multi-model analysis allows us to identify the robust and model-dependent processes at play in setting the tropical Atlantic warm SST biases. Over the SETA region, it highlights that the precise mechanisms might differ amongst the models. To better understand the detailed process, one needs to look at each model in a more specific way as done for the Cerfacs model in Goubanova et al. (2018).
Given the improved mean state and variability, we may wonder if the SST forecast skill is improved in the sensitivity experiments. The anomaly correlation with observed SST in ATL3 is indeed generally improved in the sensitivity experiments, but with only 10 years of simulations it is difficult to draw robust conclusions on this. This result has thus to be confirmed in longer experiments. Nevertheless, the importance of wind variability in driving SST variability is a feature that should be exploited to improve seasonal forecasts. To this aim, Toniazzo and Koseki (2018) have developed an anomaly coupling method that is currently being tested.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creat iveco mmons .org/licen ses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Appendix 1: Bjerknes feedback indices
The methodology closely follows the method described in Deppenmeier et al. (2016). To assess a component λ X→Y , X is averaged over a specific box (that depends on the component) and correlated to Y on a grid point basis. The correlation is calculated for each month using a time series over all years and members of the forecast for this month. Then the 10% of the grid points with maximum correlation in a box that depends on the component (see below) are averaged to extract the maximum correlation in the models. The box is chosen as in Deppenmeier et al. (2016) and corresponds to the region where the corresponding feedback is found in reanalysis data. Averaging the 10% of grid points of higher correlation allows us to take into account that the exact location of the feedback is dependent on the model and season. Then the correlation is averaged over all months and start dates independently of the lead-time so as to increase the signal-to-noise ratio. As only February and May initialised forecasts are available, the feedbacks are evaluated on spring and summer seasons when they are shown to be the stronger in reanalysis (Deppenmeier et al. 2016).
• For the second component (λ τu→ H ), wind stress anomalies averaged in the WA4 box (4°S-4°N, 40°W-20°W) and correlated to mixed layer depth anomalies with a 1-month lag in the EA4 box (4°S-4°N, 20°W-10°E). This component relates to non-local processes between western equatorial Atlantic wind stress and eastern mixed layer depths. • For the third component (λ H→SST ), mixed layer anomalies are averaged over the EA4 box and correlated to SST in the same box without any lag. This component is thus based on more rapid local processes linking mixed layer properties and SSTs.