1 Introduction

State-of-the-art, high resolution, high complexity general circulation models (GCMs) provide a sophisticated representation of the main components of the Earth system: the ocean, atmosphere, biosphere, and sea ice. As such, they are valuable tools for studying climate change on decadal to centennial timescales (e.g. Collins et al. 2011; Gent et al. 2011; Martin et al. 2011; Hazeleger et al. 2012; Flato et al. 2013; Hurrell et al. 2013; Hardiman et al. 2017). However, these models are extremely computationally expensive. They are therefore impractical for running the long simulations required to spin-up the components of the Earth system that evolve on millennial timescales, such as deep ocean circulation (England 1995) and ocean biogeochemical cycles (Falkowski et al. 2000; Key et al. 2004). GCMs with significantly faster operational speeds, as a consequence of reduced spatial resolution and/or longer timesteps, are therefore important tools for Earth system modellers who run long integrations (e.g. to study palaeoclimate, the carbon cycle and ice sheet evolution). These models allow multi-millennial climate simulations to be conducted whilst still allowing considerable detail in the complexity of the feedbacks between different Earth system processes. Examples include FAMOUS (Jones et al. 2005; Smith et al. 2008; Williams et al. 2013), the CSIRO Mk3L climate system model (Phipps et al. 2011), and low-resolution versions of CCSM3 (Yeager et al. 2006) and the GFDL coupled climate model (Dixon et al. 2003).

Models are neither complete nor perfect representations of reality because a number of physical processes are approximated, parameterised, or omitted altogether (Gupta et al. 2012). Drift is therefore an inherent problem in coupled GCMs that can occur even when no external forcing is applied (Covey et al. 2006; Gupta et al. 2012). Unforced drift can be characterised on two timescales: “fast adjustments” occur on annual to decadal timescales and typically relate to surface imbalances in heat, freshwater and (sometimes) biogeochemical fluxes; “slow adjustments” occur on centennial to millennial timescales and involve the response of deep ocean circulation and ocean biogeochemical cycles to the surface imbalances (Gupta et al. 2012). This study focuses on multi-millennial salinity drifts that primarily arise because of inaccuracies in the formulation of the hydrological budget, which lead to the non-conservation of salt or freshwater (Bryan 1998; Gupta et al. 2012). Many GCMs experience salinity drifts because they operate without dynamical ice sheets to return freshwater stored as ice back to the oceans via iceberg calving (e.g. Gordon et al. 2000; Pardaens et al. 2003; Johns et al. 2006; Gupta et al. 2012). Some older GCMs also operate without a river routing system, which inhibits continental precipitation from being returned to the oceans (e.g. Bryan 1998). The resultant drifts may be small when considered over a few decades or even centuries. However, over multiple millennia they can lead to large shifts in climatic and oceanic regimes (Bryan 1998; Pardaens et al. 2003; Covey et al. 2006).

Accurate simulation of the hydrological budget at both the global and regional scales is an influential factor in determining the strength and structure of meridional overturning circulation (MOC) and the location of deep water formation regions. In turn, these large-scale features are particularly important when studying ocean biogeochemical processes, the oceanic carbon cycle, and ocean heat transport. Surface freshwater fluxes reflect the balance of sea ice formation and melt, precipitation minus evaporation (P−E), and continental runoff. Together with ocean advection, they control sea surface salinity distributions. Salinity gradients influence ocean stratification and vertical mixing; therefore, any drifts in surface salinity could eventually propagate in to the deep ocean, affecting density driven circulation and poleward heat transport (Pardaens et al. 2003; Jin et al. 2017). Even small imbalances in the surface climate can result in large MOC shifts if maintained for multiple millennia (Pardaens et al. 2003), with the potential for both a total collapse of the Atlantic MOC (AMOC) and invigoration of the Pacific MOC (PMOC) under constant pre-industrial boundary conditions.

One approach for alleviating hydrological drift and maintaining a stable climate in long model integrations is to apply an artificial flux adjustment as a constant, predetermined freshwater flux. However, this approach is inherently non-physical (Gupta et al. 2012). Very simple representations of physical processes not otherwise included in the model, whose implementation are explicitly aimed at reducing drift, are therefore preferable. For example, HadCM3 applies a spatially uniform freshwater flux, equivalent to 0.1 mm day−1, to account for rainfall that runs into internal drainage basins (Gordon et al. 2000). Additionally, all versions of the UK Met Office Unified Model (UM) up to version 9 include a calibrated iceberg melt freshwater field to balance water losses associated with the accumulation of snow on the Antarctic and Greenland ice sheets (e.g. Gordon et al. 2000; Smith et al. 2008; Collins et al. 2011; Martin et al. 2011), which is discussed further in Sect. 2.1.2.

In this study, we aim to minimise the effects of non-closure of the hydrological budget and get the best representation of both the strength and structure of large-scale pre-industrial ocean circulation in long (10,000-year) integrations with the FAMOUS GCM. The latest generation of FAMOUS has been developed to provide increased and more dynamic Earth System capabilities. However, persistent salinity drifts on the order of 0.25 psu kyr−1 result in a collapsed AMOC and invigorated PMOC when run lengths exceed 6000 years. To prevent (or at least reduce) this large drift, the hydrological cycle must be forcibly closed by applying a freshwater flux adjustment that corrects global mean ocean salinity. In this manuscript, we compare two different methods for neutralising unforced salinity drifts in the most recent generation of the model (Sect. 3). Despite both schemes successfully maintaining a steady global mean salinity, localised drifts and subsequent feedbacks alter the sea surface temperature, salinity and density in the northeast North Atlantic and northwest North Pacific Oceans, leading to a recurrence of the aforementioned abnormal MOCs. We therefore examine the effects of intrinsic biases in the surface climatologies of two different generations of the model in the context of the hydrological cycle and overturning circulation to address whether the switch between AMOC-dominance and PMOC-dominance is an inherent feature of long FAMOUS simulations (Sect. 4). The MOC abnormalities were only observed in the latest generation of the model, with an earlier version of the model maintaining a strong, though over-deep AMOC and no PMOC (similar to the ocean state observed today). Thus, we conclude with a discussion of whether tuning could improve the accuracy of the simulated MOCs in the latest version of the model.

2 Methods

2.1 Model description

FAMOUS is a coupled ocean–atmosphere GCM based on the higher resolution HadCM3 (Gordon et al. 2000; Pope et al. 2000), a configuration of the UM version 4.5 (UM4.5; Jones et al. 2005; Smith et al. 2008; Smith 2012; Williams et al. 2013). The atmospheric component of FAMOUS is based on primitive equations and has a horizontal resolution of 7.5° × 5°, 11 vertical levels, and 1-h timestep. The oceanic component is a rigid lid model, which has a 12-h timestep, 3.75° × 2.5° horizontal resolution, and 20 vertical levels that vary in thickness from 10 m at the surface to more than 600 m at depth. The uppermost 100 m of the ocean is divided into 7 levels and there are 13 layers in the first kilometre. The maximum depth of the ocean is 5500 m. The bathymetry of FAMOUS differs from that of HadCM3 in the North Atlantic Ocean and the Nordic Seas. The parent model has deep overflow channels in these regions, which improve the representation of North Atlantic Deep Water (NADW). However, these features are not included in FAMOUS because they were found to eliminate Atlantic-sector Antarctic Bottom Water (AABW) and unrealistically increase the strength of the AMOC. The horizontal resolution of FAMOUS does not permit flow through the Denmark Straits. Iceland has therefore been removed to increase northward heat transport in the Atlantic and prevent the unrealistic build-up of sea ice in the Nordic Seas. The spatial resolution of the model also means that the Bering Strait is shut to significant mass transport and effectively shut to tracer mass transport. This is of potential relevance to the ocean circulation drifts described in this study, because it reduces exchange between the Atlantic and Pacific Oceans through the Arctic. The ocean and atmosphere are coupled once per day. At the time of this study, FAMOUS simulates 400–600 model years per wall-clock day using 16 processors on Tier 2 (regional) and Tier 3 (local) High Performance Computers at the University of Leeds. This is an order of magnitude faster than HadCM3 and therefore is ideal for multi-millennial simulations (e.g. Smith and Gregory 2012; Gregoire et al. 2012, 2015) or large ensembles (e.g. Gregoire et al. 2011; Sagoo et al. 2013).

Multiple generations of FAMOUS exist, as outlined by Williams et al. (2013). Each FAMOUS simulation is denoted by a unique 5 letter UM index. Here, we use the simulations XFHCC (Smith 2012) and XCVJO, which is an unpublished predecessor to XFHCU (Williams et al. 2013). The UM basis files for these simulations can be accessed via the Providing Unified Model Access (PUMA) service (http://cms.ncas.ac.uk/wiki/PumaService). Technical documentation and evaluations of model development work can be found in an ongoing special issue in Geoscientific Model Development (http://www.geosci-model-dev.net/special_issue15.html).

2.1.1 Land surface schemes in FAMOUS

All versions of FAMOUS published prior to Williams et al. (2013) used the Met Office Surface Exchange Scheme (MOSES) version 1 (Cox et al. 1999). MOSES1 calculates the surface-to-atmosphere fluxes of momentum, energy and water, and the vegetation-to-atmosphere fluxes of CO2, incorporating the physiological impact of atmospheric CO2, temperature and water vapour on photosynthesis and stomatal conductance (Cox et al. 1999; Valdes et al. 2017). The state of the land surface is defined in terms of four prognostic variables: temperature, canopy water, lying snow, and soil moisture (Cox et al. 1999). A significant drawback within the context of long-term Earth system modelling is that MOSES1 does not include the terrestrial carbon cycle or interactive vegetation (Williams et al. 2013; Valdes et al. 2017).

The latest generation of the model, optimised by Williams et al. (2013), includes a newer version of the land surface model, MOSES2.2 (Essery et al. 2001, 2003), as well as an update for the sea ice physics scheme and extra physics at the top of the atmosphere (Smith 2012). MOSES2.2 upgrades all aspects of the land surface exchange and surface radiation schemes compared to MOSES1 (Valdes et al. 2017). An additional advance was the introduction of a tiled representation of surface types on the sub-grid scale, which allows heterogeneous surface characteristics to be modelled (Essery et al. 2001; Valdes et al. 2017). To calculate the energy balance for each grid cell, MOSES2.2 weights the properties of the different surface types and calculates the fluxes for the average surface (Essery et al. 2003). A detailed description of the differences between MOSES1 and MOSES2.2 is provided by Valdes et al. (2017).

2.1.2 Salinity drifts in FAMOUS

Salinity drifts occur in multi-millennial simulations in FAMOUS because the water budget is not fully closed (Pardaens et al. 2003; Smith et al. 2008). The primary cause of this drift is the persistent accumulation of multi-year snow on the Greenland and Antarctic ice sheets without dynamic ice physics to allow transport to the coast or iceberg calving, resulting in insufficient freshwater being routed back to the global oceans. This is a feature of all versions of the UM4.5 family and is primarily addressed by the inclusion of a constant freshwater flux to represent iceberg melt (Smith et al. 2008). The meltwater field is calibrated in a standard pre-industrial control run to balance the model’s global salinity drift and is applied uniformly in the areas adjacent to the ice sheets where icebergs should occur. An additional issue is that the rigid lid ocean model requires the use of a virtual salinity flux (VSF) at the surface, rather than a direct water forcing (Pardaens et al. 2003; Smith et al. 2008). In FAMOUS, unlike other versions of the UM4.5 (e.g. Valdes et al. 2017), the VSF is calculated using the local sea surface salinity, which means that, although local effects may be more accurately represented, there is no guarantee that a globally balanced surface water forcing will translate into a globally balanced VSF.

Small drifts may also arise from evaporation over isolated inland seas, which have infinite capacity to fill up (from precipitation and continental runoff) and/or dry out (through evaporation), because the ocean’s rigid lid ensures that the volume of these water masses remains constant. Modelled salinities are therefore capped at 45 psu to prevent them from becoming unrealistically salty in isolated bodies of water, such as the Hudson Bay and the Red Sea (Pardaens et al. 2003).

2.2 Experiment design

2.2.1 Salinity drifts

The global volume-weighted mean salinity drift in a 10,000 year pre-industrial equilibrium simulation with the standard version of FAMOUS-MOSES2.2 is 0.25 psu kyr−1 (not shown). A drift of this magnitude is unacceptable in multi-millennial simulations because it leads to a collapsed AMOC and invigorated PMOC even under constant pre-industrial boundary conditions. Two different modifications were independently incorporated into the latest generation of the model in an attempt to minimise the salinity drift: SFLUX (Online resource 1) and VFLUX (Online resource 2). The SFLUX code modification sets a target for the global volume-weighted mean of salinity, calculates the difference between this target and the model value at each timestep, and applies a correction via a surface salinity flux. The salinity target was set to 34.65 psu to match the global volume-weighted salinity in the most recent standard spun-up version of FAMOUS (produced by Williams et al. 2013). On the other hand, VFLUX calculates the global average VSF at the surface, and applies a small salinity tendency equally at every grid cell throughout the depth of the ocean to cancel out any net salinity forcing. This ensures that there is no volume averaged drift in global salinity. Both simulations were run for 10,000 years with constant pre-industrial boundary conditions and were initialised from an unpublished simulation (XCVJO), which was a precursor to the standard FAMOUS-MOSES2.2 simulation (produced by Williams et al. 2013). In Sect. 3, the effectiveness of the two salinity drift modifications will be assessed and the impact of each modification on the model’s deep ocean circulation will be discussed.

2.2.2 Generations of FAMOUS

Two different generations of the model, FAMOUS-MOSES1 (FM1) and FAMOUS-MOSES2.2 (FM2), were also compared to investigate the effect that differences in the surface climatologies have on the hydrological cycle and large-scale ocean circulation. As with the salinity drift experiment (Sect. 2.2.1), both simulations were run for 10,000 years with constant pre-industrial boundary conditions. Both simulations were run using published model setups and were initialised from the end of 12,000-year (FM1 = XFHCC; Smith 2012) and 7000-year (FM2 = XCVJO) spin-ups, respectively. FM1 was run with a standard salinity drift modification that ensures that changes in global ocean salinity due to the VSF formulation are consistent with the overall climatic water budget—see Smith et al. (2008) for more details. However, if the water budget is drifting (e.g. because of snow accumulation or a poorly calibrated iceberg meltwater field), this modification does not eradicate the salinity drift. FM2 is the same simulation described in Sect. 2.1.1 that contains the VFLUX modification (see Table 1 for a summary of the key differences between the simulations presented in this study).

Table 1 Summary of the key differences between the simulations

In Sect. 4, results will be compared to HadCM3 because FAMOUS was originally tuned to reproduce both the equilibrium climate and climate sensitivity of the parent model (Jones et al. 2005; Smith et al. 2008; Smith 2012). The HadCM3 simulation (AAXZK) that we compare to is part of the MOSES1 pre-industrial control run conducted by Gordon et al. (2000), which was a ≈ 400 year run initialised with modern climatology, but forced with pre-industrial boundary conditions. The initial conditions for AAXZK (henceforth ‘HadCM3’) were the climate state from the year 360 of the simulation described by Gordon et al. (2000). We compare to a climatology that has been produced after a 240-year continuation with constant pre-industrial forcing. In addition, we compare to observations from the World Ocean Atlas 2013 version 2 (Locarnini et al. 2013; Zweng et al. 2013), since subsequent model development may have taken newer FAMOUS simulations away from the parent model.

3 Influence of salinity drift correction on ocean circulation

In this section, we compare the effect that the two salinity drift correction schemes (SFLUX and VFLUX) have on global salinity and the MOC in the most recent generation of the model (FM2). We focus our analyses on the Atlantic and Pacific Oceans, because one of the most distinctive features of the modern MOC is the asymmetry between these two basins (Saenko et al. 2004). Deep waters are currently formed in the Nordic and Labrador Seas in the Northern Atlantic Ocean, and the Ross and Weddell Seas in the Southern Ocean (Schmittner et al. 2007). In contrast, there is no deep water formation in the Pacific Ocean, because the relatively low density of the surface water stabilises the water column and limits downward convection (Warren 1983; Menviel et al. 2012). Instead, modern Pacific overturning is dominated by weak, southwards flowing North Pacific Intermediate Water and northwards flowing AABW (Talley et al. 2003). However, the North Pacific has been postulated as a site of deep water formation in the geological past based on evidence from proxy records and modelling studies (e.g. Okazaki et al. 2010; Rae et al. 2014).

3.1 Evolution of salinity

Both salinity drift correction schemes are effective in maintaining a constant global salinity over the 10,000 year integration (Fig. 1). However, in both simulations, there is a redistribution of salt from the Atlantic basin into the Pacific basin. Specifically, the Atlantic Ocean freshens by ≈ 0.26 psu during the first 1500 years of both simulations. In SFLUX, the Atlantic salinity plateaus at 34.81 psu between years 1500 and 4500, before increasing by ≈ 0.09 psu between years 4500 and 5250. There is a further increase of 0.04 psu between years 5250 and 7000. The regional salinity then stabilises at ≈ 34.94 psu during the final 3000 years of the simulation. In VFLUX, the Atlantic salinity drifts by < 0.04 psu between years 1500 and 5750. The regional salinity then increases by 0.135 psu between years 5750 and 6750, before plateauing at ≈ 34.95 psu during the final 3250 years of the simulation. The Pacific Ocean salinity increases by ≈ 0.07 psu during the first 500 years of the SFLUX simulation. Continued salinification (≈ 0.12 psu) is simulated between years 500 and 3000, with a period of metastability between years 3000 and 4750. The basin freshens by ≈ 0.02 psu between years 4750 and 5250, with an additional 0.01 psu of freshening simulated between years 5250 and 7000. The regional salinity stabilises at ≈ 34.63 psu during the final 3000 years of the simulation. In VFLUX, the Pacific basin becomes ≈ 0.08 psu more saline during the first 750 years of the simulation. The regional salinity increases by a further ≈ 0.09 psu between years 750 and 5800 before freshening by ≈ 0.03 psu between years 5800 and 6750. The salinity plateaus at 34.6 psu during the final 3250 years of the simulation. Overall, the regional drifts are comparable between the two simulations, with the Atlantic freshening by 0.127 psu in SFLUX and 0.109 psu in VFLUX, and Pacific salinity increasing by 0.164 psu and 0.148 psu, respectively. The causes and effects of these drifts are discussed further in Sect. 3.3.

Fig. 1
figure 1

Volume-weighted mean salinity in SFLUX (solid line) and VFLUX (dashed line) calculated for the global ocean (black), Atlantic basin (red) and Pacific basin (blue). Data are calculated from annual climate means

3.2 Meridional overturning circulation

The maximum MOC strength was calculated between 20°N and 80°N and at depths greater than 250 m. At the start of both simulations, the maximum strength of the AMOC is 14–15 Sv (1 Sv = 106 m3 s−1; Fig. 2), broadly consistent with other FAMOUS control experiments and observations at the same latitude (Table 2). An initial decrease in strength of 4–5 Sv also occurs during the first 200 years of both simulations as a result of a coupling shock, because the model configuration matches the standard FM2 simulation (XFHCU, Williams et al. 2013), whereas the initialisation state is from an earlier, unpublished FM2 simulation (XCVJO), which has not been tuned or spun-up to a comparable steady-state. However, tests with different initialisations states suggest that this has little bearing on the final MOC configuration. In both simulations, the AMOC exhibits a period of metastability before it weakens to a maximum strength of 4–5 Sv. The duration of the period of metastability and the rate of AMOC collapse differs between the two simulations. In SFLUX, the maximum AMOC strength remains relatively stable (10–11 Sv) between years 200 and 1600 before gradually decreasing to a strength of ≈ 4 Sv 4750 years after the start of the simulation. In VFLUX, the AMOC is metastable for a longer period, fluctuating between 10 and 12 Sv until year 3800. A more rapid collapse then occurs, with a 7 Sv drop in strength simulated between years 3800 and 6000. This highlights the risk of carrying out too short model integrations. If the simulations had only been run for 1500 and 3500 years, respectively, the AMOC could have been misdiagnosed as having reached steady-state. However, the longer integrations demonstrate that this is not the case, with the AMOC only stabilising after a minimum run time of 5000 years. In both simulations, the initial maximum strength of Pacific overturning (6 Sv) is in good agreement with the strength of shallow subtropical gyre overturning in the North Pacific computed from hydrographic data (Talley et al. 2003). However, by the end of both simulations, a strong PMOC cell has established. In SFLUX, the maximum PMOC strength at the end of the simulation is ≈ 15.5 Sv, with the greatest rate of increase occurring during the first 2000 years. In VFLUX, the maximum strength of overturning at the end of the simulation is ≈ 14.5 Sv and the greatest rate of increase occurs between years 3400 and 6000. The lagged evolution of the MOCs in VFLUX relative to SFLUX results from the weaker surface forcing that arises from the salinity drift correction being applied to the whole ocean volume as opposed to a single layer.

Fig. 2
figure 2

Maximum meridional overturning stream function in the Atlantic basin (red) and Pacific basin (blue) in SFLUX (solid line with circles) and VFLUX (dashed line with squares). The stream functions have been calculated between 20° and 80°N at depths greater than 250 m. Data are calculated from annual climate means. Observations show the range provided by Talley et al. (2003)

Table 2  Maximum Atlantic Meridional Overturning Circulation (AMOC) strength at 26.5°N

The MOC responses are not limited to changes in strength; there are also structural changes (Figs. 3, 4). In the Atlantic basin, NADW initially extends across all latitudes to depths of 3 km, with maximum strengths of ≈ 15 Sv (clockwise circulation; red colours in Fig. 3a, b). The AABW cell (anti-clockwise circulation; blue colours in Fig. 3a, b) fills the deep Atlantic southwards of 30°N and has a maximum strength of ≈ 5 Sv. After 10,000 years, the Atlantic overturnings are in a collapsed state (Fig. 3c, d). Atlantic MOC collapses in FAMOUS have only previously been simulated in ‘hosing experiments’—modelling studies in which freshwater is systematically added to the ocean (e.g. Smith and Gregory 2009). However, we postulate that the AMOC in the latest generation of the model (FM2) is primed to collapse in multi-millennial simulations without the need for freshwater forcing, as will be discussed further in Sect. 4. In the Pacific basin, the AABW cell initially fills the deep ocean, with maximum strengths in excess of 15 Sv (anti-clockwise circulation; Fig. 4a, b). There is shallow mixing in the tropics (above 1 km) and very weak (< 1 Sv), deep convection (< 3 km) in the subpolar North Pacific (clockwise circulation; Fig. 4a, b). However, by the end of the simulations, North Pacific Deep Water (NPDW) is the dominant water mass in the Northern Hemisphere, with maximum strengths of ≈ 15 Sv. The AABW cell has weakened (to a maximum strength of ≈ 9 Sv) and become more restricted in both lateral and vertical extent (Fig. 4c, d). Both the strength and structure of the PMOC at the end of these simulations resembles the strong Pacific overturning cell simulated by Jackson et al. (2016) in response to freshwater hosing compensated by the removal of freshwater from the surface ocean outside of the hosing region.

Fig. 3
figure 3

Atlantic Meridional Overturning Circulation in SFLUX (a, c) and VFLUX (b, d) at the start of the simulation (a, b) and after 10,000 years (c, d). Red colours show clockwise circulation, blue colours indicate anti-clockwise circulation. Data are calculated from 100 year annual climate means

Fig. 4
figure 4

As for Fig. 3, but for Pacific Meridional Overturning Circulation

The changes in ocean circulation between the beginning and end of the simulations are reflected in the changes in the model’s mixed layer depth (Fig. 5), which provides a useful approximation for sites of vertical convection (e.g. Lavender et al. 2002). There is a decrease in Atlantic-sector mixing in both simulations, with the mixed layer reducing in depth by more than 150 m in the Nordic Seas (Fig. 5a, c). The mixed layer in the Bering Sea deepens in excess of 160 m, as deep water formation is invigorated in this region (Fig. 5b, d).

Fig. 5
figure 5

Change in mixed layer depth between the start and end of the simulation in SFLUX (a, b) and VFLUX (c, d). Blue colours indicate shoaling of the mixed layer associated with a reduction in deep convection. Red colours show a deepening of the mixed layer associated with increased deep convection. Data are calculated from 100 year annual climate means

3.3 Surface climatologies

Further information regarding both the drivers and impacts of the changes in ocean circulation between the start and end of the two simulations can be observed in the surface climatologies of the Bering Sea and the North Atlantic Ocean (Fig. 6). During the first six millennia of both simulations, there is a small but persistent increase in sea surface temperature (SST; Fig. 6a, b) and sea surface salinity (SSS; Fig. 6c, d) in the northwest North Pacific Ocean, whilst the northeast North Atlantic Ocean surface becomes colder and fresher. After 6000 years, SSS in the Pacific region has risen by ≈ 0.85 psu in both simulations. Increasing SSS reduces the buoyancy of the surface waters (Fig. 6g, h), thereby promoting sinking and an intensification of overturning circulation rates. This brings more relatively warm, salty equatorial waters northwards, thus contributing to the gradual salinification and the ≈ 0.2 °C per millennium temperature increase. As the sea surface warms, it becomes more evaporative (Fig. 6e, f), thereby further increasing the SSS and invigorating the MOC. Thus, a positive feedback is initiated that links increasing SSSs and SSTs with enhanced evaporation and a strengthened MOC, and whilst warming partly counteracts the influence of rising salinity on density, it does not completely compensate for the effect. The surface climatologies then plateau during the final 4000 years of both simulations, consistent with the concurrent stabilisation of the MOCs.

Fig. 6
figure 6

Northern North Atlantic and North Pacific sea surface temperature (a, b), sea surface salinity (c, d), sea surface evaporation (e, f), and sea surface density (g, h) in SFLUX (left) and VFLUX (right). Atlantic data were averaged over the area between 50°–70°N and 0°–30°W. Pacific data were averaged over the area between 50°–60°N and 165°–195°E. These regions correspond to the areas that experience the largest changes in mixed layer depth between the start and end of the simulations, as shown in Fig. 5

The response in the North Atlantic basin is non-linear in both simulations. In SFLUX, the surface ocean freshens by more than 2 psu during the first half of the simulation, with ≈ 1 psu of this freshening occurring between years 4500 and 5000 (Fig. 6c). In VFLUX, the SSS is stable during the first 3500 years of the simulation, with a gradual period of freshening occurring between 3500 and 5500 years. Between 5500 and 6000 years, a 1.25 psu drop in SSS is simulated (Fig. 6d). As the surface ocean freshens, the waters become less dense (Fig. 6g, h), inhibiting deep water formation and reducing the rate of overturning circulation. Consequently, colder waters accumulate in the region. This explains the ≈ 4 °C cooling that occurs in both simulations, ≈ 2 °C of which is concurrent with the periods of rapid freshening. Cooler waters are less evaporative; therefore, in the opposite feedback loop to the North Pacific, the freshness of the sea surface is enhanced.

Thus, although the salinity modifications maintain the global mean ocean salinity, the local salinity drifts and resultant changes in the MOC indicate that important imbalances exist in the simulation of the hydrological cycle. In both simulations, the northeast North Atlantic Ocean is initially ≈ 2.8 °C warmer and 0.7 psu saltier than the northwest North Pacific Ocean. However, by the end of the simulations, this relationship is reversed, with the North Atlantic region becoming ≈ 2.3 °C cooler and more than 2 psu fresher than the North Pacific region (Fig. 6a–d). It is the positive feedbacks associated with these local drifts that change the density of the surface waters and promote the reversal from AMOC-dominance to PMOC-dominance in the model when the length of the simulation is sufficiently long.

It is important to address whether the switch between AMOC-dominance and PMOC-dominance is an inherent feature of long FAMOUS simulations, because the model is intended for multi-millennial simulations (as well as ensemble generation and examinations of model development). For example, it has already been used to better understand ice sheet dynamics (Gregoire et al. 2012, 2015) and associated sea level rises (Gregoire et al. 2016), and it will be used to conduct further long, transient climate simulations (Ivanovic et al. 2016). The abnormal overturning circulation states that we see here will impact the surface climate, the ocean carbon cycle and tracer distributions. This abnormality has not previously been observed in a pre-industrial control run, despite long integrations having been conducted using both FM1 (4000 years, Smith et al. 2008; 12,000 years; Smith 2012) and FM2 (3500 years, Williams et al. 2014). However, the maximum AMOC strength in FM2 reported by Williams et al. (2014) was at the lowermost limit of observational estimates (when the uncertainty ranges are taken into account; Table 2). We therefore postulate that overturning circulation is more sluggish in the latest generation of the model, and is primed to collapse when local surface imbalances pass a threshold during the course of multi-millennial simulations.

4 Comparison of FAMOUS-MOSES1 and FAMOUS-MOSES2.2

In this study, the collapsed AMOC and invigorated PMOC outlined in Sect. 3 are exclusive to the FM2 setup. In the following section, we will therefore investigate why these MOC changes are not seen in FM1 simulations, which have previously been shown to produce a reasonably strong AMOC (Table 2) and no identifiable PMOC. We examine differences in the initial surface climatologies of the two generations of the model, focussing on parameters that influence sea surface density, either directly or indirectly: precipitation, evaporation, SSS, SST and sea ice.

4.1 Meridional overturning circulation

The global volume-weighted mean salinity in FM2 is constant because of the inclusion of the aforementioned salinity drift modification (VFLUX). However, there is a redistribution of water between the Atlantic and Pacific basins because of enhanced evaporation over the Pacific and reduced evaporation over the Atlantic compared to FM1 and HadCM3 (discussed further in Sect. 4.2). As outlined in Sect. 3.2, FM2 therefore simulates a positive feedback loop that reduces the maximum AMOC strength from 14 to 4 Sv and initiates a PMOC that reaches a maximum strength of 14.5 Sv.

In FM1, there is a small global salinity drift of ≈ 0.02 psu per millennium, with both basins freshening at a similar rate (Fig. 7). In this simulation, the regional salinity drifts reflect the overall global salinity drift, which arises because the surface hydrological budget is not fully closed and the iceberg melt field is imperfectly calibrated (Sect. 2.1.2). As a result of these small salinity drifts, the AMOC gradually increases in strength from an initial value of 16.5 Sv, stabilising at ≈ 20 Sv after 3000 years (Fig. 8). The strength of shallow subtropical gyre overturning in the Pacific is stable (≈ 4.5 Sv) throughout the FM1 simulation, and there is no NPDW formation.

Fig. 7
figure 7

Volume-weighted mean salinity in FM1 (solid line) and FM2 (dashed line) calculated for the global ocean (black), Atlantic basin (red) and Pacific basin (blue). Data are calculated from annual climate means

Fig. 8
figure 8

Maximum meridional overturning stream function in the Atlantic basin (red) and Pacific basin (blue) in FM1 (solid line with circles) and FM2 (dashed line with squares). The stream functions have been calculated between 20° and 80°N at depths greater than 250 m. Data are calculated from annual climate means. Observations show the range provided by Talley et al. (2003)

Consequently, the MOC structures in FM1 provide a better representation of pre-industrial ocean circulation than FM2, compared to pre-industrial simulations with HadCM3 (e.g. Jackson and Vellinga 2012) and observational estimates for the modern period (Talley et al. 2003). At the start of the FM1 simulation, NADW extends across all latitudes to depths of 3 km, with a maximum strength of 16.5 Sv. A weak AABW cell, with a maximum strength of 3 Sv, is also simulated below 3.5 km and southwards of 20°N (Fig. 9a). By the end of the simulation, the NADW cell extends across all depths and latitudes, with a maximum strength of ≈ 20 Sv. There is no AABW cell because AABW formation in the Southern Ocean is too weak (Fig. 9c). This is a known limitation of the FAMOUS GCM (Smith 2012; Williams et al. 2013), which demonstrates why single number metrics should not be used as comprehensive measures of the MOC. The over-deep NADW cell is an important model bias that also has implications for the ocean carbon cycle and ocean tracer distributions. However, this is not accounted for when quoting only the maximum AMOC strength.

Fig. 9
figure 9

a, c Atlantic and b, d Pacific Meridional Overturning Circulation in FM1 at the start of the simulation (a, b) and after 10,000 years (c, d). Red colours show clockwise circulation, blue colours indicate anti-clockwise circulation. Data are calculated from 100 year annual climate means

Pacific circulation in FM1 remains stable throughout the simulation and is a closer match to modern observations than the abnormal PMOC that is simulated in FM2. Antarctic Bottom Water extends across almost all latitudes below 1–2 km and there is shallow convective mixing in the tropical Northern Hemisphere (Fig. 9b, d).

4.2 Northern Hemisphere surface climatologies

The surface ocean in the Nordic Seas in FM2 is initially warmer and fresher than in FM1, whilst the Bering Sea is warmer, saltier and drier (due to a combination of reduced precipitation and increased evaporation). The temperature and salinity in these two regions affect the density of the surface waters with significant repercussions for deep water formation and density-driven circulation. These intrinsic biases are amplified by the positive MOC feedbacks that occur in FM2 (as outlined in Sect. 3.3), resulting in more substantial differences between the surface climatologies of the two generations of FAMOUS, the parent model, and observations after 10,000 years (Online resource 3).

Differences exist between the simulations’ hydrological cycles, as demonstrated by the precipitation, evaporation (Fig. 10), and SSS anomalies (Fig. 11). Initially, there are large differences in the annual mean P−E (ranging from − 14.0 to + 7.7 mm day−1) between the three simulations in the Intertropical Convergence Zone, which primarily reflect differences in both the amount and structure of low latitude precipitation. Because this is a region of high rainfall (with > 12 mm day−1 simulated in FAMOUS and > 18 mm day−1 simulated in HadCM3), small shifts in the precipitation patterns create large anomalies, resulting in their disproportionate representation.

Fig. 10
figure 10

Difference in precipitation (left) and evaporation (right): a, b FM1 minus FM2, c, d FM1 minus HadCM3, and e, f FM2 minus HadCM3. FAMOUS data are the annual climate means calculated from the first 100 years of the simulations. HadCM3 data are the 240-year annual climatology. Hatched areas show the approximate locations of the simulated Northern Hemisphere deep water formation regions, which are in good agreement with observed deep and intermediate water formation regions in the Nordic Seas and North Pacific Ocean, respectively. Note that FAMOUS does not simulate deep water formation in the Labrador Sea

Fig. 11
figure 11

Difference in sea surface salinity: a FM1 minus FM2, b HadCM3 minus observations, c FM1 minus HadCM3, d FM1 minus observations, e FM2 minus HadCM3, and f FM2 minus observations. FAMOUS data are the annual climate means calculated from the first 100 years of the simulations. HadCM3 data are the 240-year annual climatology. Observations are the 1955–2012 surface climatology from the World Ocean Atlas 2013 version 2 (Zweng et al. 2013). Hatched areas show the approximate location of the simulated Northern Hemisphere deep water formation regions as per Fig. 10

Elsewhere, the most striking P−E anomaly is in the northwest North Pacific, where both versions of FAMOUS simulate a relative drying (due to reduced precipitation) compared to HadCM3. The discrepancy is more pronounced in FM2 than FM1, because FM1 has a compensating bias between insufficient precipitation (Fig. 10c) and insufficient evaporation (Fig. 10d). Improvements in FM2 have increased evaporation rates over the North Pacific but have not substantially altered the precipitation rates. Consequently, the northwest North Pacific Ocean in FM2 is > 1 psu more saline than in FM1 and > 2 psu more saline than HadCM3. However, the majority of the global surface ocean in HadCM3 is too fresh compared to observations, with an average anomaly of approximately 1 psu in the Pacific Ocean (Fig. 11b; Zweng et al. 2013). The SSS biases are therefore more complex when comparing FAMOUS directly to observations, although the North Pacific deep water formation region remains too saline in both generations of the model.

Both versions of FAMOUS are also > 2 °C warmer than HadCM3 in the northeast North Pacific Ocean, and > 1 °C colder in the northwest North Pacific (Fig. 12). However, HadCM3 has a prominent cold bias in the North Pacific Ocean (Fig. 12b). Sea surface temperatures across the majority of the North Pacific are therefore too cold when comparing FAMOUS directly to observations (Fig. 12d, f). Consequently, FAMOUS simulates too much sea ice in the northwest North Pacific (Fig. 13), which is another known limitation of the model (Smith et al. 2008). North of the Bering Strait, FAMOUS is > 3 psu saltier and up to 3 °C warmer than both HadCM3 and observations, with insufficient Arctic sea ice. We suggest that these biases occur because the closed seaway reduces high-latitude exchange, and they are more pronounced in FM2. Furthermore, FM2 is up to 4 °C warmer than FM1 in the Bering Sea (Fig. 12a), which contributes towards the enhanced evaporation rates (Fig. 10a) and increased SSS (Fig. 11a) in this region. Although the simulated warming somewhat counteracts the influence of elevated SSSs on sea surface density in the Bering Sea in FM2, the effect of the elevated SSSs (increasing density) outweighs the effect of the elevated SSTs (reducing density). Consequently surface waters become less buoyant compared to FM1 and HadCM3, reducing water column stability and promoting deep water formation.

Fig. 12
figure 12

As for Fig. 11, but for sea surface temperature. Observations are the 1955–2012 surface climatology from the World Ocean Atlas 2013 version 2 (Locarnini et al. 2013)

Fig. 13
figure 13

Difference in sea ice area fraction in the Northern Hemisphere (ac) and the Southern Hemisphere (df). FAMOUS data are the annual climate means calculated from the first 100 years of the simulations. HadCM3 data are the 240-year annual climatology. Hatched areas show the approximate location of the simulated Northern Hemisphere deep water formation regions as per Fig. 10. Text markers denote the location of the Amundsen Sea (A), the Bellinghausen Sea (B), the Davis Sea (D), George V Land (G), the Ross Sea (R) and the Weddell Sea (W)

Both versions of FAMOUS simulate temperatures up to 9 °C cooler than HadCM3 and observations around the coast of Greenland and Iceland (Fig. 12), which is linked with expanded annual sea ice in the Nordic and Labrador Seas (Fig. 13). FM1 simulates the lowest temperatures and the most sea ice in these regions, which creates denser surface waters (as poleward-bound water in the upper AMOC limb cools) and enhances deep water formation. With less brine rejection occurring in FM2, the North Atlantic sea surface is ≈ 1 psu fresher than in FM1 (Fig. 11a). Consequently, the North Atlantic surface waters in FM2 are more buoyant, which inhibits convective mixing in the Nordic Seas.

Overall, FM2 is initially warmer and fresher than FM1 in the North Atlantic deep water formation regions. These biases increase the buoyancy of the surface waters in FM2 relative to FM1, inhibiting deep convection. In the Bering Sea, FM2 is initially warmer and drier (due to increased evaporation and reduced precipitation), which makes the sea surface more saline, and hence the net-effect is to increase the density of the Pacific surface waters. The subsequent reduction in AMOC strength and invigoration of the PMOC (as described in Sect. 3.2) act to amplify these biases. We therefore propose that the salinity drift correction schemes (Sects. 2.2.1, 3) only alter the timing of the MOC tendencies in FM2 and do not prevent them.

Observational estimates suggest that, in the modern oceans, approximately 0.8 Sv of freshwater is transported from the North Pacific Ocean into the Arctic Ocean via the Bering Strait (Coachman and Aagaard 1988). Modelling studies have examined the impacts of an open versus closed Bering Strait on the strength of the AMOC, demonstrating that a closed Bering Strait increases the salinity of the Arctic and North Atlantic Oceans, which results in a stronger AMOC (e.g. Shaffer and Bendtsen 1994; Goosse et al. 1997; Wadley and Bigg 2002). In contrast, an open Bering Strait increases the freshwater input into the North Atlantic Ocean, which supresses the AMOC. The closed Bering Strait could therefore be contributing to the MOC problems in the FM2 simulations, because the salinity anomalies are being trapped in their respective basins. An opening of this gateway could provide an important negative feedback mechanism for redistributing freshwater and preventing the build-up of salinity in the North Pacific in FM2. However, opening the Bering Strait is non trivial and to date has always caused the circulation in FAMOUS to become unstable.

4.3 Southern Ocean

Although the focus of this study has been on the Northern Hemisphere driven MOCs, we briefly note that there are also differences between the two generations of FAMOUS in the Southern Ocean. With no transport through the Bering Strait, all exchange between the Atlantic and Pacific basins goes through the Southern Ocean. Biases in this region are therefore important for Southern Hemisphere deep water formation, the accurate simulation of abyssal water mass properties, and the reinforcement of climatological trends in the Atlantic and Pacific basins.

Southern Ocean SSTs in FAMOUS are initially 0.5–6 °C warmer than in HadCM3 (Fig. 12), and there is less sea ice in the Ross, Weddell and Davis Seas (Fig. 13). With less brine rejection occurring in FAMOUS, the surface of the Southern Ocean is > 3 psu fresher than in HadCM3 (Fig. 11), which will increase surface buoyancy and vertical stratification. FM2 has more sea ice than FM1 in the Bellinghausen, Amundsen and Ross Seas, where SSTs are up to 2 °C colder, and less sea ice in the Weddell Sea and off the coast of George V Land, where SSTs are up to 2.5 °C warmer. As previously discussed, surface biases impact the water column properties. In the Southern Ocean, the consequence is that FM2 is colder and fresher than FM1 in the top 200 m of the Southern Ocean, and warmer and more saline below this depth (Fig. 14). Where they counteract, the effect of salinity tends to outweigh the effect of temperature on water buoyancy. FM1 is therefore denser in the uppermost kilometre of the ocean, whilst the FM2 water column is more stratified, which inhibits vertical convection and the formation of AABW.

Fig. 14
figure 14

Southern Ocean stratification in FM1 (left) and FM2 (middle) as identified from salinity (ac), potential temperature (df), and potential density (gi). Data are the zonal annual climate means calculated from the first 100 years of the simulations. Note that depth is non-linear

Antarctic Bottom Water formation is well correlated with the strength of the Antarctic Circumpolar Current (ACC; Gent et al. 2001; Smith 2012), which is the strongest current in the global oceans. The ACC extends between 56° and 62°S (Johnson and Bryden 1989) and has measured strengths of 137 ± 8 Sv in the Drake Passage (Cunningham et al. 2003). For comparison, the observed strength of the Gulf Stream in the Florida Straits is ≈ 32 Sv, increasing to 120 Sv off the coast of Newfoundland (Meinen and Luther 2016). The mean transport of the East Australia Current at 30°S has been estimated at ≈ 22 Sv (Mata et al. 2000) and the total transport of the Agulhas Current between 27°S and 40°S is ≈ 70 Sv (Bryden et al. 2005). The ACC strength in FM1 (30 Sv) is weaker than range of values in the Intergovernmental Panel on Climate Change’s Fourth Assessment Report (IPCC 2007). However, it is stronger than the 15–20 Sv simulated in FM2 (Fig. 15). Overall, the combination of an ACC that is operating at less than a quarter strength and weak AABW formation will further limit the amount of exchange that occurs between the major ocean basins, thereby reinforcing the Atlantic and Pacific trends. The effect is greater in FM2 because its ACC is approximately half the strength of the ACC in FM1.

Fig. 15
figure 15

Maximum Antarctic circumpolar current strength in FM1 (blue) and FM2 (green), calculated as the integrated total ocean U-velocity across the Drake Passage

5 Discussion and conclusions

As with many GCMs, the surface hydrological budget in FAMOUS is not fully closed, with imbalanced hydrology over inland seas and insufficient snowmelt leading to small salinity drifts. These drifts may accumulate over time; therefore they are particularly important in the long integrations for which FAMOUS was partially intended. Any drifts in surface salinity can eventually propagate in to the deep ocean, affecting the global MOC, and consequently, poleward heat transport, surface climate, the ocean carbon cycle, and ocean tracer distributions.

Two different methods were employed to account for these imbalances and neutralise the salinity drifts in the most recent version of the model (FM2): the first (SFLUX) maintains the global volume-weighted mean salinity at 34.65 psu by applying a surface flux correction; the second (VFLUX) enforces zero drift by applying a small salinity tendency equally throughout the depth of the ocean to cancel out any net surface forcing. Both methods successfully maintained a steady global mean salinity. However, neither method prevented regional salinity drifts. Small but persistent increases in SSS were simulated in the northwest North Pacific Ocean, with concurrent freshening in the northeast North Atlantic. These changes stimulated positive feedback loops in the MOC and surface climate, leading to the development of a strong, deep PMOC and a collapsed AMOC after 6000 years. For example, in the Pacific basin, increasing SSS raised the sea surface density, which promoted sinking and an intensification of the PMOC. The stronger MOC enhanced poleward salt and heat transport thereby increasing regional SSS and SSTs. The warmer sea surface was more evaporative, which led to further increases in SSS. The opposite biases and feedbacks operate in the North Atlantic, therefore the net effect is a weakening of the AMOC. Both simulations exhibited a period of metastability before trending resumed, highlighting the risk of carrying out too short model integrations when studying deep ocean circulation and ocean biogeochemical cycles.

Comparing two different generations of the model (FM1 and FM2) suggests that these problems are endemic to multi-millennial simulations with FM2 (the latest generation of the model), and would occur regardless of the choice of salinity drift correction scheme. The sea surface in the North Atlantic deep water formation regions in FM2 is intrinsically too warm and fresh, which inhibits sinking, whilst the combination of too much evaporation and insufficient precipitation in the northwest North Pacific Ocean increases the SSS and initiates deep convection. Thus, we postulate that the AMOC in the latest generation of the model is primed to collapse in multi-millennial simulations, without the need for freshwater forcing. FM1 (the older generation of the model) does not suffer the same biases in the sea surface climatologies. Erroneous pre-industrial NPDW formation therefore does not occur in FM1, and AMOC remains strong (though over-deep). The over-deep NADW cell is also an important model bias that will have implications for the surface climate and ocean tracer distributions. It would therefore be valuable for future model development work to focus on increasing the strength of the ACC, shoaling the NADW cell, and maintaining an abyssal AABW cell in the deep Atlantic basin. Currently, the importance of the surface imbalances and MOC differences depend on the scientific questions for which the model is being used. However, where long integrations are required, FM1 is a more appropriate choice as it provides a better representation of the pre-industrial MOC than FM2.

The MOC abnormalities that occur in the FM2 simulations could potentially be resolved with tuning, which is a calibration process that attempts to find the optimal values of uncertain model parameters and minimise the discrepancy between observations and model output (Gregoire et al. 2011). FAMOUS has previously been tuned both systematically (Jones et al. 2005; Gregoire et al. 2011; Williams et al. 2013) and manually (Smith et al. 2008), with the focus being on the tuneable parameters that have a high impact on the climate in HadCM3, such as the threshold of relative humidity for cloud formation and the conversion rate of cloud liquid water droplets to precipitation. Other physical parameters that have been tuned include the sea ice albedo, and the atmospheric and oceanic diffusion parameters (Gregoire et al. 2011). Variations in these parameters affect the model’s P−E balance, and therefore the SSS, sea surface density, and MOC. For example, the simulation of clouds is crucial for the simulation of precipitation, and also planetary albedo, which influences surface temperatures, and consequently evaporation rates. Typically, perturbed parameter simulations have a centennial run length (e.g. Gregoire et al. 2011; Williams et al. 2013). There is therefore a risk that small errors in the optimised surface climatologies may not be identified during the calibration process. As demonstrated by this study, these become important on longer timescales when localised drifts occur. However, it is not feasible to conduct hundreds of multi-millennial perturbed parameter simulations with a model of this complexity and resolution. A more suitable methodology would be to conduct multi-millennial runs with a subset of high performing simulations from an ensemble of centennial integrations, offering a way forward for carrying out an improved tuning approach. In the process of choosing optimal parameter values for these extended simulations, we suggest that particular attention should be paid to the temperature and salinity balances in the surface ocean, particularly in the deep water formation regions.

Overall, this study demonstrates that small, regional biases in the sea surface climate are important for the accurate simulation of the MOC on multi-millennial timescales, because they can cause regional salinity drifts even when the global hydrological cycle has been forcibly closed. Currently, we have not identified any specific trends that occur well in advance of the final collapse of the AMOC, which would allow other model users to diagnose potential problems in the overturning circulation without the need to run for multiple millennia. However, we suggest that the northwest North Pacific Ocean (50°–60°N; 165°–195°E) and northeast North Atlantic Ocean (50°–70°N; 0°–30°W) are important areas that should be closely monitored for imbalances in the surface hydrology that may redistribute freshwater between sites of deep water formation.