Surface heat fluxes over the ocean form a key component of the Earth’s energy budget (Peixoto and Oort 1992) as the oceans comprise the main heat storage reservoir in the climate system and so determining how and where the oceans are warming up should provide important constraints for IPCC-class coupled climate models (e.g., Palmer and McNeall 2014). The air–sea heat fluxes are also needed to provide atmospheric forcing fields for ocean-only models (e.g., the Coordinated Ocean–ice Reference Experiments (COREs) documented in Griffies et al. 2009) and to assess heat budgets and the implied meridional transports of heat in comparison with direct oceanic transport estimates based on hydrographic section data (e.g., Bryden and Imawaki 2001; Macdonald and Baringer 2013). There has been increased interest recently in trying to improve air–sea heat flux estimates because of the availability of new satellite radiation data at the top of the atmosphere, e.g., from Clouds and Earth’s Radiant Energy Systems (CERES; Loeb et al. 2009), and the availability of the Argo float network since the early 2000s for measuring changes in the upper 2000 m ocean heat content. A review of the current state of the art and the potential goals for the future can be found in WCRP (2012), Josey et al. (2013), Yu et al. (2013) and von Schuckmann et al. (2015).

Air sea fluxes are notoriously difficult to determine on large space (basin-scale) and timescales (interannual-to-decadal) mainly due to sensitivity of the fluxes to parameterizations of boundary layer processes, cloud radiative feedbacks and to the highly variable wind speed and sea state conditions; see WGASF (2000). Global products based on locally estimated quantities such as the National Oceanography Centre (NOC) flux products (Josey et al. 1999; Berry and Kent 2009), which used ship observations from the International Comprehensive Ocean–Atmosphere Data Set (ICOADS), often show unrealistically large heat flux biases (+15–30 W m−2) when integrated over global scales. Some atmospheric reanalysis products, such as the NCEP/NCAR reanalysis R1 (Kalnay et al. 1996; referred to as NCEP-R1) and the NCEP/DOE reanalysis R2 (Kanamitsu et al. 2002; referred to as NCEP-R2), have global budgets that are close to being balanced (~3 W m−2). However, others such as the ECMWF ERA-Interim reanalysis (Dee et al. 2011; referred to as ERAi), or the NASA MERRA reanalysis (Rienecker et al. 2011), still have significant global heat budget imbalances (+11 W m−2 for ERAi and +21 W m−2 for MERRA, see Josey et al. 2013, Table 5.1). Furthermore, they cannot reliably reproduce surface fluxes that directly depend on clouds (such as radiation and precipitation) because cloud observations are not assimilated, and the turbulent fluxes also suffer from the absence of coupled feedbacks with the ocean due to the fixed surface temperature boundary conditions that do not allow ocean temperatures to respond, e.g., underneath mid-latitude or tropical storms.

Josey and Smith (2006) argued that progress in flux field development requires careful evaluation of global flux products against independent observations, particularly from surface flux buoys. Many flux products now exist including some based on satellite data, such as the Japanese Ocean Flux data sets with Use of Remote sensing Observations (J-OFURO; Kubota et al. 2002), the Hamburg Ocean Atmosphere Parameters and Fluxes from Satellite Data (HOAPS, Andersson et al. 2010) and the IFREMER turbulent flux product documented in Bentamy et al. (2013). Examples of hybrid products (a combination of atmospheric reanalysis and satellite measurements) adjusted with satellite and/or in situ measurements include the Common Ocean–ice Reference Experiments (CORE.2) dataset (Large and Yeager 2009) and the WHOI Objectively Analyzed Ocean–Atmosphere Fluxes (OAFlux) product (Yu et al. 2008), which is the product that has been most completely evaluated against in situ flux measurements so far.

This paper is a first review of surface heat fluxes from an ocean reanalysis perspective, using the joint CLIVAR-GSOP/GODAE OceanView Ocean Reanalysis Intercomparison Project (ORA-IP) datasets (Balmaseda et al. 2015 and CLIVAR Exchanges no. 64). These ocean reanalyses (hereafter the ORA-IP products) are examples of ocean data assimilated models that are actively being used either for climate monitoring studies, e.g., ocean heat content (Xue et al. 2012; Balmaseda et al. 2013b) or steric sea level (Storto et al. 2015) variability, or for operational ocean forecasting (Lellouche et al. 2013; Blockley et al. 2014). Air–sea heat fluxes are made up of short and longwave radiation terms, along with turbulent fluxes for heat (sensible and latent) computed from bulk formulae, with both the outgoing longwave radiation (computed using the Stephan–Boltzmann Law) and the turbulent fluxes depending sensitively on the sea surface temperature (SST). For ORA-IP, air–sea fluxes are then generally computed based on prescribed atmospheric states from either atmospheric reanalyses or a blend of atmospheric reanalysis and satellite observations. However, because the SSTs are influenced by the near surface data being assimilated and used to constrain the different ORA-IP products, the results from ORA-IP still develop a range of estimated air–sea fluxes.

This comparison examines both net surface heat fluxes and the individual flux components, using the ORA-IP ensemble means and spreads for years 1993–2009. The analysis is limited both by the variety of forcing design across the different products (details provided in Sect. 2), and the availability of output data from the products, therefore it is very hard to account for all the differences. A broader discussion of other ways forward is presented in the Sect. 6 at the end of the paper.

The paper outline is as follows. A summary of the forcing methods used for the ocean reanalyses, along with details of the assimilation increment diagnostics and additional flux data sets used in the comparison, are given in Sect. 2. Section 3 looks at the global time-mean heat budgets and compares these with global flux products from a variety of sources, including ship- and satellite-based products and atmospheric reanalysis fluxes. It also looks at the implied meridional heat transports and the role of the assimilation increments in closing the mean ocean heat budget. Section 4 looks at the temporal variability of the surface heat fluxes at seasonal and interannual timescales based on an ORA-IP ensemble of surface heat flux estimates. Section 5 makes comparisons of monthly mean ORA-IP surface heat fluxes with in situ flux data taken from the operational tropical moored buoy arrays and one OceanSITES station (Stratus) in the eastern South Pacific, where longer flux records exist. Section 6 provides a summary and further discussion.

Heat flux products

Surface heat fluxes in ORA-IP

Sixteen monthly surface heat flux products originating from ocean and coupled reanalyses have been compared over the 17-year period (1993–2009). These reanalyses have each been run with different model configurations, data assimilation systems and observational data sources. Table 1 summarises the ocean/coupled model frameworks and the choices made by each of the reanalysis for computing surface boundary forcing, including the atmospheric data sets, bulk formulae and SST observational products. Details beyond those provided in this table can be found in the cited references.

Table 1 Summary of the surface heat flux datasets from ocean or coupled reanalyses (ORA-IP products) used in the comparison study

There are three main approaches that have been used to generate air–sea fluxes in ORA-IP: (1) “Flux Correction”, (2) “Bulk Flux Forcing”, and (3) “Coupled Model Fluxes”, although the variety of treatments within these classes is great and later results cannot be easily distinguished according to these classifications.

  • “Flux Correction” for PEODAS, ORAS4 and GODAS

With this approach, the surface fluxes (for momentum, heat and freshwater) from an atmospheric reanalysis product are applied directly to the ocean surface, along with a surface relaxation of SST towards an observational product to prevent model drift. In ORA-IP, this SST restoring is often applied with rather short damping timescales (typically, 1–5 days) as a way of assimilating SST satellite gridded data into the models (see seventh column of Table 1).

  • “Bulk Flux Forcing” for MOVECORE, MOVEG2, GECCO2, ECCOv4, CGLORS05v3, UR025.3, UR025.4, GloSea5, GLORYS2v1 and GLORYS2v3

In this approach, the turbulent fluxes (for heat, water and momentum) are derived from bulk formula using a prescribed atmospheric state and the model’s SST, which may also be affected by data assimilation (see Table 1). Nine of the reanalyses employ the CORE bulk formulae described in Large and Yeager (2004, 2009) applied using atmospheric data from either an atmospheric reanalysis (ERAi, NCEP-R1 or JRA-55) or a blend of reanalysis data and satellite observations (CORE.2). The near surface atmospheric variables at high temporal resolution (6- or 3-h) are adjusted from 2 to 10 m, following the Monin–Obukhov similarity parameterisation (Large and Yeager 2004; Eqs. 9b-c), and then combined with the model’s SST (model top-level potential temperature) and surface currents, to compute the turbulent fluxes at each model time step. The chosen atmospheric state also includes precipitation and runoff data (not discussed here), and radiative (downward shortwave and longwave) fluxes.

Some products (MOVEG2, CGLORS05v3 and GLORYS2v1/v3) applied a priori adjustments to atmospheric reanalysis data such as radiation and precipitation, to prevent biases associated with cloud parametrizations (Kallberg 1987). For example, CGLORS05v3 and GLORYS2v1/v3 applied corrections to radiative (shortwave and longwave) heat fluxes from the ERAi product by means of a large-scale climatological correction coefficient derived by the Global Energy and Water Cycle Experiment (GEWEX) project, following the method described by Dussin and Barnier (2013). Furthermore, two products use tuned Bulk Flux Forcing from the adjoint method (GECCO2, ECCOv4), where the surface fluxes are part of the control vector of the optimization problem (see Köhl 2015 for further details). GECCO2 fluxes are computed from Large and Pond (1981) bulk formula applied to NCEP-R1 fields, whereas ECCOv4 uses CORE bulk formula with all a priori forcing data taken from the ERAi product.

  • “Coupled Model Fluxes” for MOVE-C, CFSR and ECDA

These three products come from coupled models where the surface fluxes are a resolved part of the overall solution (ocean, atmosphere, land and sea ice states) to the coupled reanalysis (details provided in the cited references in Table 1). Although this approach should give more internally consistent flux treatments, these products may also be more weakly constrained by observational data.

Conservation of heat in ORA-IP

The majority of the ORA-IP products (except ECCOv4 and GECCO2) are affected by interior sources and sinks of heat associated with the temperature assimilation using sequential filtering schemes. This is also true for volume and salt conservation (see Schiller et al. 2013). When diagnosing heat budgets these additional sources/sinks of heat should be taken into account. To do this, we partition the total flux into the net heat flux, Qnet at the surface, plus a term arising from changes in temperature within the fluid column associated with assimilation increments for temperature, Qassim (referred to as the assimilation heat flux in the following):

$$Q_{tot} = Q_{net} + Q_{assim} ,$$

with Q net  = Q sw  + Q lw  + Q lat  + Q sen (the sum of the shortwave, longwave, latent and sensible heat fluxes), and Qassim determined (in heat flux units; W m−2) according to:

$$Q_{assim} = C_{p} \rho_{0} \int_{-H}^{0} \frac{\left( T_{obs} - T_{model} \right)}{\tau }dz,$$

where the vertical integral of the local temperature increment (T obs  − T model ) extends over the full ocean column from the bottom z = −H to the surface z = 0, and τ is the data assimilation time window (in sec), and ρ o  = 1035 kg m−3 and C p  = 4000 kg−1 J K−1 are representative sea water density and specific heat capacities, respectively.

Other surface flux products used in this comparison

The ORA-IP products are compared with other global air–sea heat flux data from in situ (ship) observations, satellite data, atmospheric reanalyses, or hybrid products (a combination of atmospheric reanalysis and remote sensing products), and locally, with buoy flux data measured at moorings (limited in both time and space). All additional flux products are listed in Table 2. These other global flux products are not references, but they do contribute to assessment of the uncertainty in the context of evaluating ORA-IP, while the point comparisons against local buoy data allow the seasonal cycle and annual mean fluxes from ocean reanalysis products to be evaluated and calibrated.

Table 2 Additional surface heat flux data used in this comparison study

Except for the buoy data (whose details are provided in Sect. 5), all datasets listed in Tables 1 and 2 are available on a monthly mean basis and have been interpolated to a common 1° by 1° grid. All quantitative comparisons in the following sections are performed with monthly mean data on this common grid.

Global time-mean surface heat fluxes

Spatial maps of Qnet

Global maps of the 17 year (1993–2009) time-mean net surface heat flux, Qnet, for 16 individual ORA-IP products are shown in Fig. 1. While the overall patterns are similar there are many notable differences in magnitude. Overall, the coupled products (14–16) appear more positive than other products over large areas (this is shown in Fig. 2 where products are ordered by their global means). In particular, ECDA shows additional warming in the southern hemisphere while CFSR has the strongest heating rates in the western tropical Pacific and north Indian ocean. Among the eight products (1–8) forced with ERAi fields, CGLORS05v3 (3) shows visibly more cooling, especially in the tropical oceans. The cause may be related to an over-correction of the radiative fluxes from the ERAi product in this analysis. It is also seen that high latitude regions are not available for ECCOv4 (2) or GODAS (10) products, and in PEODAS (11) the high latitudes are clearly anomalous compared to other products.

Fig. 1
figure 1

Global spatial distribution of net surface heat fluxes from ORA-IP products (listed in Table 1) averaged over their common years (1993–2009). (MOVECORE (panel 12) was based on the CORE.2 fluxes which end in 2007, and so was extended through 2009 using the monthly CORE.2 climatology). Positive is heat flux into the ocean. Contours indicate the locations of the 17-year mean zero net heat fluxes. Colorbar units are in W m−2

Fig. 2
figure 2

Global mean heat fluxes averaged over the 17-year period (1993–2009) along with their interannual STDs over this period. Sixteen ORA-IP Qnet (net surface heat flux) products (blue bars) along with the 16-member ensemble average (dark grey bar) are shown in comparison with other products derived from observations, atmospheric reanalyses or blended products (light grey bars) to the right-hand side, with the error bars representing interannual standard deviations. The products (labelled on the x-axis) have been ordered according to their global surface imbalance, from positive to negative. Nine ORA-IP products have assimilation heat fluxes, Qassim, defined in Eq. (2) (orange bars), along with total heat fluxes, Qtot = Qnet + Qassim (green bars). A common global ocean-land mask has been used for all estimates (see text for details). Positive is heat flux into the ocean. Units are in W m−2

Global heat closure

Figure 2 shows on the left global time-mean (1993–2009) total, net surface and assimilation heat fluxes from all ocean products, compared on the right to the global means of the other flux products listed in Table 2. The products (labelled on the x-axis of Fig. 2) have been ordered according to their global surface imbalance, from positive to negative. The 16-member ensemble mean of all ocean reanalysis fluxes is also shown (grey bar). A common global mask (all cells that are not flagged as land on the common grid) has been used for all estimates in Fig. 2 as this is the domain to which the whole Earth’s energy budget closure constraint applies (see e.g., Josey et al. 2013, Table 5.1).

For the ensemble of 16 ORA-IP products the 17-year average (±interannual STD) is 4.2 ± 1.1 W m−2, which increases to 6.8 ± 1.3 W m−2 when averaged ±60° latitude to avoid areas with some missing data (seen in Fig. 1; other models also have very crude treatments for sea–ice). The 2.6 W m−2 difference in ocean heat gain reflects the high latitude cooling regions and equates to a poleward heat transport of 0.32 PW across 60°N into the Arctic, comparable to observational estimates of 0.28 PW across 55°N in the Atlantic (Bacon 1997). However, the 6.8 W m−2 of ocean heat gain over ±60° latitude would clearly imply a much larger poleward heat transport, but is instead balanced by assimilation terms and heat storage gains (see below).

The ocean reanalysis net surface heat flux products (Qnet, blue bars in Fig. 2) have global imbalances in surface heating ranging from +13 W m−2 (for coupled products ECDA, CFSR) to −10 W m−2 (CGLORS05v3), generally showing a similar level of closure to the atmospheric reanalysis products, between +11.5 W m−2 (MERRA) and −15 W m−2 (JRA-55). The independent ship and satellite products (e.g., NOC2.0, ISCCP/J-OFURO and ISCCP/OAFlux) have larger imbalances, +15–25 W m−2, although these may be biased by the ISCCP radiation product.

The ocean reanalyses generally have a much smaller global heat budget residual when the data assimilation terms, Qassim defined in Eq. (2), are taken into account to produce an equivalent total heat source, Qtot = Qnet + Qassim (green bars in Fig. 2). For seven of the products this total flux, Qtot, gives a heat gain between 0.5 and 2 W m−2, which is close to recent estimates of heat content change from combined XBT and Argo data, which give warming rates of 0.64 ± 0.11 W m−2 (0–700 m) between 1993 and 2008 (see Loeb et al. 2012 and Roemmich et al. 2015). For most of the products the assimilation terms, Qassim (orange bars in Fig. 2), represent a global ocean heat loss, countering the heat gain through the resolved surface heat flux, and we will see below that this cancellation mostly takes place in the top 100 m.

Oceanic feedback on heat fluxes through SST

Figure 3a presents timeseries for the 60°S–60°N averaged sea–air temperature differences (upon which the turbulent heat fluxes sensitively depend) for six ORA-IP products forced by CORE Bulk Formula using ERAi surface fields. The ERAi line (bold dashed black) represents the difference between the air temperatures and the SSTs originally used as atmospheric lower boundary conditions (see Dee et al. 2011), and this is seen to be fairly steady ~1.1° C. All products show positive values reflecting the ocean being warmed by radiation and needing to cool through turbulent fluxes. Some results are relatively steady like ERAi, whereas others show significant discrepancies, for instance, UR025.4, GloSea5 and GLORYS2v1/v3 all show smaller values in earlier years suggesting that the assimilated satellite SSTs (a combination of AATSR, AVHRR and AMSRE data) are cooler than the Reynold’s SSTs used to determine ERAi air temperatures.

Fig. 3
figure 3

Year-to-year variability of global (60°S–60°N) averaged: a Sea–air temperature (SST–Tair) from a subset of ORA-IP products forced by CORE Bulk Formula using ERAi surface fields in comparison with the ERAi product itself; b Net surface heat fluxes (the sum of shortwave, longwave, latent and sensible heat fluxes) from the ERAi product and as calculated from the six ORA-IP products; c Assimilation heat fluxes arising from changes in temperature associated with assimilation increments for temperature over the upper 100 m (calculated from Eq. (2) with the vertical integral extending from the surface to 100 m)

The net surface heat fluxes into the ocean (Qnet) from ERAi and from the six ORA-IP products, are shown in Fig. 3b. The SST rise seen in some of the products over years 1993–2009 (Fig. 3a) leads to a decrease in Qnet (i.e., an increase in ocean cooling dominated by latent heat), whereas the ERAi Qnet shows no obvious trend. The assimilation heat fluxes in the top 100 m for these six products (calculated from Eq. (2) with the vertical integral now extending from the surface down to 100 m) are shown in Fig. 3c. These are generally negative (cooling the upper 100 m of the ocean, consistent with the full depth increments from Fig. 2) and are also decreasing with time. For example, the surface heat flux change of more than 10 W m−2 in GloSea5 is balanced by a decrease in the upper layer assimilation cooling by about the same amount. This is strong evidence that there is compensation between near surface assimilation increments (mostly resulting from SSTs) and the applied surface heat fluxes. This does suggest that assimilation increments may well contain useful information about surface heat flux biases.

Ensemble comparisons of mean surface heat fluxes

Figure 4 shows the ORA-IP ensemble mean and spread (STD) of the 17-year (1993–2009) time-mean net surface heat flux, Qnet, and also the spread of the net radiative Qrad (shortwave plus longwave), and turbulent Qtur (latent plus sensible) heat fluxes using all products in Table 1.

Fig. 4
figure 4

a Ensemble mean of the 17-year (1993–2009) time-mean net surface heat flux (Qnet) from the 16 ORA-IP products listed in Table 1. Ensemble spread (STD) of time-mean, b Qnet (the sum of the shortwave, longwave, latent and sensible heat fluxes), c net radiative Qrad (shortwave plus longwave) fluxes, d net turbulent Qtur (latent plus sensible) heat fluxes, and e the sum of spread in net radiative and turbulent heat fluxes (STD Qrad + STD Qtur). f Correlation (CORR) between ensemble member Qrad and Qtur fluxes over the same period. Positive values in Plate a indicate heat flux into the ocean (W m−2). The ensemble spread in Qrad (Plate c) is computed using 10 different shortwave radiation products (ERA40, ERAi, CORE.2, JRA-55, NCEP-R1, NCEP-R2, ERAi with corrections, MOVE-C, CFSR, and ECDA) and 16 different longwave radiation products (because these depend on the model’s SST). Contours of 15 W m−2 (black solid lines on Plates bd) indicate the global average of spread in long-term mean Qnet

The ensemble mean Qnet (Fig. 4a) is in broad agreement with the climatological fluxes based on bulk formula applied to observations (e.g., Berry and Kent 2009), with the tropical oceans showing heat gain, and the subtropics and high latitudes heat loss, especially in the vicinity of the western boundary currents (WBCs). In most areas the ensemble spread in Qnet (Fig. 4b) is dominated by turbulent heat fluxes (Fig. 4d), with the largest spreads (>40 W m−2) occurring over the northern hemisphere WBCs, as well as in the Southern Indian ocean south of 20°S. There is also significant spread contributed by net radiation (~25 W m−2), particularly around the cooler upwelling regions off the west coasts of continents (Fig. 4c). If short and longwave radiation spreads are independently assessed they are found to partially compensate, being anti-correlated in most regions (not shown). These differences in shortwave radiative fluxes are related with problems simulating clouds in the atmospheric reanalysis products (ERAi, NCEP-R2 and JRA-55) and also to errors in the coupled models (e.g., the shortwave flux in CFSR is much too strong in the western tropical Pacific compared to satellite radiation from ISCCP and CERES products). Clearly STD Qnet (Fig. 4b) ≪ [STD Qrad + STD Qtur] (Fig. 4e) indicating anti-correlation between Qrad and Qtur between members of the ensemble (Fig. 4f); standard deviations are shown rather than variances because the scales are easier to interpret. The exceptions being the area south of Japan where radiation dominates the spread in fluxes, perhaps due to difference in representing aerosols downstream of China, and in the southwest Indian ocean, where correlations between Qrad and Qtur are also positive.

Implied ocean heat transports

Zonal averages in the net surface heat fluxes can be used to infer a meridional ocean heat transport that is consistent with a steady state ocean circulation under these fluxes. Figure 5a, b show inferred meridional heat transports for the ensemble mean, and each of the ocean reanalyses. Figure 5a uses the resolved surface heat fluxes, i.e., Qnet = Qsw + Qlw + Qlat + Qsen, and Fig. 5b uses the total heat fluxes including the assimilation terms, i.e., Qtot = Qnet + Qassim (Qassim defined in Eq. 2). The fluxes for the reanalyses UR025.3/4 have been previously compared to the advective heat and freshwater transports in Haines et al. (2012) and Valdivieso et al. (2014a); however, the full reanalysis transports have not yet been compared for the full range of reanalyses shown here. In Fig. 5a, b all surface fluxes are integrated starting in the south (i.e., the Antarctic continent) and working northwards. At a number of latitudes an independent meridional heat transport is provided based on inverse modelling using hydrographic section data from Ganachaud and Wunsch (2003) and Lumpkin and Speer (2007).

Fig. 5
figure 5

a Global Meridional Heat Transport (MHT) inferred from integrated ORA-IP surface heat fluxes (starting from the south) and a steady state assumption (16 colour lines with the shading indicating ORA-IP ensemble spread). b MHT inferred from integrated surface heat fluxes adjusted by assimilation increments (nine products). The red symbols represent WOCE-based inverse model estimates at control sections from Ganachaud and Wunsch (2003) and Lumpkin and Speer (2007). Assimilation increments generally improve agreement with external transport estimates outside the tropical region ±10°. Positive numbers indicate northward transport. Units are in PW

Figure 5a shows a rapid divergence of results moving northwards through the southern ocean, with the largest global heat flux imbalance in ORA-IP, ~13 W m−2 for CFSR and ECDA (Fig. 2), corresponding to an integrated transport discrepancy in the north ~+5 PW. The ensemble spread from the ORA-IP products is shaded around the mean, and we see that this area grows rapidly in the southern ocean, remains steady in the southern subtropics, then grows again crossing the tropics, but remains fairly stable through the northern subtropical and higher latitudes. This suggests that the largest uncertainty in net surface heat fluxes occurs in the southern oceans and in the tropics, which is broadly consistent with the Qnet spread map in Fig. 4b.

When the assimilation terms are added in Fig. 5b, the reanalysis products are brought much closer to a steady balance, with most products showing less than 0.5 PW residual transports at 80°N, reflecting a global net flux imbalance now less than 1.5 W m−2. The shaded area in Fig. 5b now represents the 9-member ORA-IP spread which is greatly reduced compared to Fig. 5a (the same 9 ORA-IP members on Fig. 5a have almost the same mean and spread as shown for all 16). However, in some of the products there are now highly anomalous implied ocean transports appearing near the equator. The anomalous transports are symmetric with strong values away from the equator, but converging just to the north and south, without obvious influence at higher latitudes (the shaded spread increases at the equator but is reduced again further north). The problem is most likely related to difficulties in producing a realistic equatorial dynamical balance in the models due to the lack of a pressure gradient bias correction in some of the data assimilation systems (see e.g., Bell et al. 2004; Balmaseda et al. 2007, for further details).

Evaluation of flux variability

In this section we look at the ORA-IP ensemble consistency of the net surface heat flux temporal variability, both for the seasonal cycle and on interannual timescales. We also evaluate regional co-variability of surface heat fluxes and SSTs from ORA-IP and other products on interannual time scales. Note that the comparisons hereafter are limited to the surface heat fluxes (Qnet) because the assimilation fluxes (Qassim defined in Eq. 2) are not available for some of the products.

Surface heat flux seasonal cycles

The consistency of the seasonal cycles among all ORA-IP Qnet (net surface heat flux) products in Table 1 is presented in Fig. 6. Figure 6a shows the standard deviation of the monthly mean (1993–2009) ensemble average Qnet (note that the mean flux biases are removed from each product). This increases away from the tropics and is largest around the northern WBCs where warm water is always present off the continental shelves, and where large heat losses in winter can be sustained when cold air flows off the continents. Figure 6b looks at the typical variability amongst seasonal cycles from each model based on monthly deviations from the ensemble mean seasonal cycle. Figure 6c, d show the same but for only the northern winter months (December–February) and southern winter months (June–August), respectively. The 10 W m−2 contour is highlighted showing that the monthly mean variability amongst all the products generally differ by less than this over large areas. The northern western boundary currents, especially in the winter months, show the largest differences (>25 W m−2); monsoon upwelling areas off east and west Africa and the Arabian Peninsula also show large variability (due to shortwave and latent heat flux variability) in northern summers.

Fig. 6
figure 6

a Seasonal STD of Qnet (net surface heat flux) estimates (16 products) computed from the monthly climatology over 1993–2009 (the mean Qnet over the 17-year base period has been removed from each product). Ensemble spread of the 17-year mean seasonal cycle, b annual average (January–December), c Northern winter season (December–February), and d Northern summer season (June–August). The 10 W m−2 contour is highlighted

Interannual heat flux signals

To determine the robustness of the interannual net surface heat flux signals in the ensemble of products, Fig. 7 shows the Signal to Noise ratio for the interannual variability over the period 1993–2009. The Signal (Fig. 7a) is the temporal standard deviation of the ensemble annual mean Qnet anomalies from the 1993–2009 average, and the Noise (Fig. 7b) is the product standard deviation around the ensemble mean averaged over the same period. Two versions of the Signal to Noise ratio are shown; Fig. 7e using all the products, except PEODAS, and Fig. 7f using only 12 products, i.e., without the coupled reanalyses. The temporal STD of ensemble annual mean turbulent (latent plus sensible) heat flux (Qlat + Qsen) and SST anomalies from 15 products except PEODAS over the same period, are also shown in Fig. 7c, d respectively, for comparison with Fig. 7a.

Fig. 7
figure 7

Interannual signals over 1993–2009 of a net surface heat fluxes, c turbulent (latent plus sensible) heat fluxes and d SSTs—all estimated from yearly anomalies relative to the 17-year (1993–2009) mean using all ORA-IP products, except PEODAS. b Noise estimated as the product STD around the ensemble mean averaged over the same period. e Signal to Noise Ratio for the ORA-IP Ensemble Qnet (15 products) and f Signal to Noise Ratio using only 12 products, excluding the coupled reanalyses. The solid contours on Plates ef indicate the location of Signal/Noise ratios of 1

The strongest interannual signals in Qnet (Fig. 7a) are dominated by variability in latent and sensible heat loss from the ocean surface (Fig. 7c) and occur primarily in the El Niño-Southern Oscillation (ENSO) region in the tropical Pacific, where Qnet anomalies are strongly co-located with SST anomalies (Fig. 7d). Figure 7e shows interannual Signal to Noise Ratio up to 2 throughout the equatorial Pacific, reflecting the detection of ENSO, with the areas of detectable signal spreading to 20° N/S in the western Pacific. In the North Pacific, there is a suggestion of consistent interannual signal near the Gulf of Alaska, where values reach 1.2–1.3, that may be associated with the Pacific Decadal Oscillation (PDO). However, in other areas of large flux variability, such as the northern WBCs and their extensions, there is little detectable interannual signal due to large noise (>20 W m−2) among the products (Fig. 7b). The coupled products contribute to a larger ensemble noise in the tropics (particularly the CFSR; Fig. 1) and when removed, the Signal/Noise in Qnet increases substantially, especially in the western Pacific warm pool (Fig. 7f).

To develop a more detailed comparison of these signals, Fig. 8 shows a set of longitude-time (Hovmöller) plots of the monthly latent heat flux (Qlat) anomalies (removing the 1993–2009 mean seasonal cycles) for the ENSO region of the tropical Pacific (5°N–5°S, 130°E–80°W) from 1993 to 2009. Eleven of the ORA-IP products are represented (wherever latent fluxes are available separately) along with the 11-member ensemble mean, and eight other latent heat flux products, including atmospheric reanalyses, ship- and satellite-based or hybrid products.

Fig. 8
figure 8

Monthly latent heat flux (Qlat) anomalies relative to the 17-year (1993–2009) base period in the Tropical Pacific (130°–280°E, meridionally integrated between 5°S and 5°N). Eleven of the ORA-IP products are represented along with the 11-member ensemble average (Plates al) in comparison with eight other latent heat flux products, including atmospheric reanalyses, ship and satellite derived products or blended products. Positive anomalies indicate more than normal oceanic heat loss due to increased latent heat flux

Differences in the strength and location of the heat flux anomalies associated with the El Niño 1997/1998 and La Niña 1999/2000 are clearly seen in Fig. 8. During the build up of El Niño in 1997 there is strong warming of the eastern equatorial cold tongue that coincides with the largest increase of latent heat loss (Qlat > 0) around 220–260°E in all products, except NOC2.0 which is too noisy (Fig. 8m). The La Niña phase from 1999 to 2000 is marked by the opposite behaviour, i.e., large negative Qlat anomalies (reduced ocean heat loss) can be seen in most products, responding to cool SSTs, but unlike the 1997/1998 flux anomalies, they tend to spread eastward as their amplitudes decrease. Among the ORA-IP products, the coupled CFSR and ECDA are clear outliers, with CFSR showing large flux anomalies in the central Pacific prior to 1997 and ECDA with positive/negative anomalies in the western Pacific (130°–180°E) from 1993 to 2001, that are absent in all other products. Most of the ERAi-forced ORA-IP products have spurious (non physical) positive flux anomalies coincident with the Tropical Ocean–Atmosphere (TAO) mooring lines, particularly from 1993 to 1995, that are not in the other products, including in the original ERAi product itself (Fig. 8q). Josey et al. (2014) noted this anomaly pattern may be related to assimilation of near-surface data from the TAO array in the ERAi reanalysis. The satellite and hybrid products show quite consistent Qlat anomaly patterns (Fig. 8n–p), whereas NCEP-R2 (Fig. 8r) has very weak El Niño 1997/1998 and large anomalies in the central Pacific after 2001.

We now look at the consistency in reproducing surface flux anomalies during a particular anomalous year. Figure 9 shows ensemble ORA-IP SST (Fig. 9a) and Qnet (Fig. 9b) anomalies in 2008 (relative to the period 2001–2009) in the Pacific sector. The SST anomaly pattern in 2008 is associated with the Pacific Decadal Oscillation (PDO) (the most negative since 1971; Peterson and Baringer 2009), with negative anomalies along the west coast of North America from Alaska to the equator, and positive anomalies in the area to the west extending to up 30°N/S. Corresponding net surface heat flux anomalies (positive Qnet corresponds to oceanic heat gain) can be seen in the North Pacific up to 40°N, with anti-correlation, i.e., increased Qnet (due to reduction in latent and sensible heat loss) over negative SST anomalies and reduced Qnet over positive SST anomalies, indicating where surface heat fluxes act to damp SST anomalies that have been generated directly by the atmosphere. For comparison with the ORA-IP ensemble output, Fig. 9c, d show the 2008 Qnet anomalies from the ERAi product and the CERES radiation combined with the OAFlux dataset, respectively. Both the ORA-IP ensemble and the ERAi reanalysis are capable of producing a very similar pattern of variability to that obtained from the combined satellite CERES radiation and OAFlux fields for much of the Pacific basin. This result is encouraging for studies that rely mostly on the use of atmospheric reanalysis-based radiation with potential biases with simulated clouds.

Fig. 9
figure 9

a Pacific basin-scale SST anomalies in 2008 (relative to 2001–2009) from the ensemble of ORA-IP (15 products, except PEODAS) and the corresponding anomalies in net surface heat fluxes, Qnet, as derived, b from the ORA-IP ensemble, c from the ERAi reanalysis and d from CERES radiation combined with OAFlux turbulent fluxes. The solid contours represent the location of the zero SST/Qnet anomalies. Positive Qnet anomalies represent more heat than normal going into the ocean in areas of mean net heat gain (tropics and eastern Pacific) or less heat than normal being lost from the ocean in areas of mean net heat loss (e.g., WBCs and their extensions), see Fig. 4a for the climatological location of the zero net heat flux

We also consider surface heat flux anomalies in 2009 in the North Atlantic (Fig. 10) between July and December, which are associated with a persistent negative phase of the North Atlantic Oscillation (NAO) (Cayan 1992), which started in July 2009 (Arndt et al. 2010). The NAO tripole pattern is clearly seen in the Qnet anomaly pattern from the ORA-IP ensemble (15 products except PEODAS) and the CERES/OAFlux combined product (Fig. 10a, b, respectively). Differences between the ensemble of ORA-IP and CERES/OAFlux toward the mid-latitude western boundaries suggest incorrect positioning of the Gulf Stream and the North Atlantic Current, which is too zonal in some of the models. The ERAi atmospheric analysis itself gives the surface heat flux product in Fig. 10c, although this is influenced by the fixed daily SST used at the surface boundary. An alternative way of using the atmospheric reanalysis data is to combine CERES TOA (top-of-atmosphere) fluxes with a correction based on vertically-integrated ERAi heat flux divergences in the atmosphere, to derive a net heat flux at the ocean–atmosphere interface. Such a product is documented in Liu et al. (2015) and the results can be seen in Fig. 10d. This flux product now contains short-scale anomalies associated with atmospheric winds that are not seen in the original ERAi net heat flux anomalies (Fig. 10c), but is otherwise consistent with the CERES/OAFlux combination.

Fig. 10
figure 10

Net surface heat flux (Qnet) anomalies between July and December 2009 (computed as departures from 2001 to 2009 monthly means) in the North Atlantic sector: a ORA-IP ensemble using 15 products, except PEODAS, b CERES net radiation combined with OAFlux turbulent fluxes, c ERAi product (the sum of the shortwave, longwave, latent and sensible heat fluxes) and d top of atmosphere (TOA) CERES net radiation combined with ERAi transport divergences. The solid contours represent the location of the zero Qnet anomalies. Positive Qnet anomalies represent more heat than normal going into the ocean in areas of mean net heat gain (low latitudes regions south of 30°N) or less heat than normal being lost from the ocean in areas of mean net heat loss (high latitude regions north of 30°N)

Ensemble comparisons with buoy flux data

Local evaluation of gridded surface flux products e.g., from models or satellite data, against in situ flux measurements is really only possible at a few locations where conditions controlling surface fluxes have been monitored by meteorological buoys for periods of a year or more to cover the seasonal cycle. In this section, we present a comparison between monthly mean ORA-IP surface heat fluxes, and the corresponding fluxes derived from buoy measurements taken from the operational tropical moored buoy arrays and the Woods Hole Oceanographic Institution (WHOI) Stratus buoy which is located in the eastern South Pacific. Such comparisons are very important to distinguish between the many available gridded flux products, as noted by Yu et al. (2013).

Comparison with tropical buoys

The comparisons begin with tropical buoys from the TAO array in the tropical Pacific (McPhaden et al. 1998), the RAMA array in the tropical Indian (McPhaden et al. 2009) and the Pilot Research moored Array (PIRATA) in the tropical Atlantic (Servain et al. 1998; Bourlès et al. 2008). These buoy flux data are available through the OceanSITES project ( and include radiative (shortwave and longwave) and turbulent (latent and sensible) fluxes computed using the COARE3.0b flux algorithm (Fairall et al. 2003). Here we use eight buoy deployments within the 10°S–15°N latitude band for the 24 month period Jan 2008–Dec 2009 (except Jan–Dec 2007 for the PIRATA site 15°N38°N and Jan–Dec 2009 for the RAMA site 15°N90°E), when all four components of the heat flux are available. We note that long-moored timeseries are available at some locations (e.g., the TAO sites on the equator), but they are not continuously distributed in time. A model-data comparison for the full ORA-IP time frame 1993–2009 would require dealing with any gaps in the buoy data and interannual variations in surface heat flux computations associated with ENSO and is beyond the scope of this manuscript.

Figure 11a shows annual mean net heat flux and the individual flux components at the eight buoys, along with monthly standard deviations about the annual means representing mainly the seasonal cycle. There is net heat flux into the ocean at all locations, with a maximum heating rate ~175 W m−2 at the TAO site 0°N110°W in the eastern Pacific cold tongue. The net heat fluxes off the equator at 15°N and 10°S are considerably less than the net heat flux at the equator, reflecting wind-induced evaporation, particularly at the southeast PIRATA site 10°S10°W. The longwave and sensible heat fluxes vary rather little between buoys, and also have small seasonal variability, so that shortwave and latent heat fluxes are the dominant heat flux terms. Wang and McPhaden (2001, Appendix B) estimated the accuracy range for annual averaged flux components from TAO buoys (at 110°W, 140°W, 170°W and 165°E) to be: (1, 7–12, 6, 12–19 W m−2) for (sensible, latent, longwave, shortwave) fluxes, respectively. The net flux errors (assuming they are uncorrelated) are then 17, 23, 18, and 16 W m−2 at 165°E, 170°W, 140°W and 110°W, respectively, always dominated by the shortwave errors.

Fig. 11
figure 11

a Observed annual mean net surface heat fluxes and their individual flux components (i.e., shortwave, longwave, latent and sensible heat fluxes) at eight buoy locations of the operational tropical moored buoy arrays (10°S–15°N) over 2007–2009. The error bars represent monthly standard deviations. Positive values indicate heat flux into the ocean (W m−2). b Mean flux differences from buoy, (Ensemble ORA-IP–Buoy), derived from 11 ORA-IP products (listed in bold in Table 1), interpolated to each buoy location and averaged over the same period, with the error bars representing the spread among the various ORA-IP products. “All Buoys” indicates mean fluxes/differences from buoy averaged across all eight buoys (see text for details)

Figure 11b shows (Ensemble ORA-IP–Buoy) differences using buoy annual heat fluxes and ensemble fluxes from the 11 products with all component fluxes available (listed in bold in Table 1), interpolated to each buoys location and sampling time period, with the STD now indicating spread among the ORA-IP products about the ensemble mean differences. “All Buoys” indicates average differences across all eight buoys.

Differences (biases) for nearly all flux components are negative indicating overestimation of ocean heat loss or underestimation of ocean heat gain. Shortwave biases, typically −6 ± 6 W m−2, suggest the reanalysis downward shortwave fluxes are slightly too weak on average. Longwave net losses are 5 ± 3 W m−2 too strong compared to the buoys indicating that the reanalyses have either a too strong upward component or too weak downward component. Both radiation biases contribute to cooling the ocean model surface. Sensible heat flux biases are smaller leading to ~3 ± 1.5 W m−2 additional cooling in the reanalysis products. However, the latent flux biases dominate, with ~15 ± 9 W m−2 of greater surface cooling in the ORA-IP ensemble. The resulting net heat flux into the ocean in ORA-IP is 10–41 W m−2 (averaging at 29 W m−2) too weak across all buoys, compared to the buoy average net flux of ~87 W m−2. None of the flux component errors are correlated with the observed buoy fluxes themselves (as seen from the distributions in Fig. 11a, b). However, flux component errors are correlated between ensemble members so that the range in net flux errors shown in Fig. 11b are typically smaller than would be expected based on errors in the components. The dominant effects are anti-correlated shortwave and longwave flux errors and anti-correlated shortwave and latent flux errors.

Figure 12 shows a more detailed comparison in the western Pacific warm pool (165°E) and the equatorial cold tongue (110°W), with individual members of the ORA-IP ensemble and other products in Table 2, including the ICOADS-based product, five satellite-based or hybrid products and four atmospheric reanalysis products.

Fig. 12
figure 12

Mean heat flux component differences from two TAO locations of the equatorial Pacific: a the western Pacific warm pool (165°E) and b the equatorial cold tongue (110°W) for the period 2007–2009. The dashed vertical lines indicate the standard error for seasonal variations in net heat fluxes (Net) from TAO data based on Wang and McPhaden (2001, Appendix B)

Negative offsets in nearly all components in all products indicate the excess cooling in ORA-IP, with biases ~25–40 W m−2 in net flux not uncommon, well outside of the TAO error estimates of ±17 W m−2 at 165°E and ±15.6 W m−2 at 110°W, also dashed in Fig. 12. Among the ORA-IP products, MOVEG2 and the coupled CFSR and ECDA reanalyses show the greatest agreement at 165°E, within error bounds of TAO data, but with bias compensation between radiative and latent flux components. Satellite radiation (ISCCP, CERES) combined with OAFlux, J-OFURO or HOAPS3.2 turbulent fluxes are all within the TAO error estimates, while the net flux bias in the ship-based NOC2.0 product is much larger, especially in the cold tongue region where difference reaches ~−60 W m−2. These NOC2.0 differences come from the turbulent (latent and sensible) components, perhaps suggesting different biases in the bulk variables in the flux computation. For the atmospheric reanalysis products, differences of 35–75 W m−2 less heat into the ocean are dominated by latent and shortwave fluxes, similar to the ORA-IP biases. The NCEP-R2 product underestimates shortwave radiation by about 50 and 65 W m−2 compared with TAO. As was pointed out by Wang and McPhaden (2001), this bias in shortwave is probably due to using too high surface albedo over the ocean. In contrast, each component of the MERRA flux, as well as the net flux, is within 10 W m−2 of the TAO values in the western Pacific warm pool.

The tropical comparisons in Figs. 11 and 12 contrast with the global average results in Fig. 2 which show nearly all ORA-IP products gaining heat, probably at a higher rate than would be consistent with Argo data alone (e.g., Loeb et al. 2012).

Comparison with WHOI Stratus buoy

Outside the equatorial band a long-term (2000–2010) “Reference Timeseries” of in situ flux measurements from the WHOI Stratus buoy in the eastern subtropical South Pacific (mean location, 19.9°S, 85.3°W), has recently been made available to us (Bob Weller personal communication, 2014). The computation of the Stratus buoy fluxes uses hourly near-surface meteorological data and the COARE3.0b flux algorithm (Fairall et al. 1996). The height of the sensors above the sea surface is 2.3 m for humidity and air-temperature, which compares favourably with the 2 m height at which temperature and humidity data are available from reanalyses. In addition, stability-dependent height corrections are applied to the buoy winds in order to adjust the 2.7 m anemometer measurements to 10 m height winds from reanalyses. Details of the buoy instrumentation and the bulk flux algorithm, along with the accuracy of the computed buoy fluxes, are documented in Weller (2015) (see also In this comparison we use monthly mean fluxes from the Stratus buoy over a 9-year measurement period from January 2001 through December 2009 inclusive (108 months), within the ORA-IP time frame 1993–2009.

Figure 13 shows monthly timeseries of observed surface heat fluxes at the Stratus buoy (dashed blue lines) and the corresponding estimates derived from the ensemble of 11 ORA-IP products extracted from the grid points nearest to the buoy. The Stratus 9-year mean net heat flux into the ocean is 38 W m−2, with shortwave heating (191 W m−2) balanced by cooling dominated by latent heat (−103 W m−2), and to a lesser extent longwave radiation (−43 W m−2). The accuracy of the monthly mean surface heat fluxes from the Stratus buoy data are estimated by Weller (2015) to be: ±3 W m−2 for net shortwave heat flux; ±2 W m−2 for net longwave heat flux; ±4 W m−2 for latent heat flux; and ±1.5 W m−2 for sensible heat flux. There are compensating errors in flux components and the error in net surface heat flux is estimated to be ±8 W m−2.

Fig. 13
figure 13

Monthly timeseries of observed surface heat fluxes at the Stratus buoy (dashed blue lines) and the corresponding estimates derived from the ensemble of 11 ORA-IP products with all component fluxes available, see Table 1 (red lines with the pink shading indicating ensemble spread). Also shown to the right are the mean seasonal cycles in heat flux components over the period 2001–2009. Positive represents heat flux into the ocean. Units are in W m−2

The ORA-IP ensemble reproduces the seasonal cycle in net heat flux dominated by variability in net shortwave radiation (Fig. 13a, b), although the net fluxes in the models are systematically lower than the buoy, with ~16 W m−2 (42 %) less heat going into the ocean, nearly all of which (14 W m−2) is additional latent heat cooling. This clearly exceeds the error bounds of the computed net heat fluxes from the Stratus buoy by at least 50 %. Figure 13b–e show that seasonal component differences among ORA-IP products can be large: shortwave flux is overestimated, especially in spring and summer (October–January; Fig. 13b) and is compensated by more cooling due to latent, longwave and sensible fluxes in all seasons (Fig. 13c–e).

Figure 14a, b show ORA-IP ensemble differences from the buoy flux components using only the six ORA-IP products forced by ERAi using CORE bulk formula. Monthly differences in shortwave radiation are out of phase with longwave radiation (anti-correlation of −0.67), so that the net radiation bias is reduced in Fig. 14a. Differences in net flux are strongly correlated (up to 0.9) with differences in latent heat flux which dominate the turbulent fluxes, indicating that latent heat fluxes explain nearly all the variability in the differences between the buoy and the ORA-IP ensemble. The latent heat flux differences are in turn primarily explained by differences in surface wind speed (grey curve in Fig. 14b), with the ERAi winds being systematically too high (positive differences) compared to the buoy data, and with flux difference having significant correlated high frequency variability (~−0.6) associated with wind speed differences between ERAi and the buoy measurements.

Fig. 14
figure 14

Monthly differences from the Stratus buoy, Ensemble ORA-IP (ERAi)–Stratus Buoy, using only the six ORA-IP products forced with ERAi fields using the CORE bulk formula: a radiative flux components and net radiation (short plus long wave heat fluxes), and b turbulent (sensible plus latent) and net heat fluxes along with surface wind speeds. Positive differences in surface wind speed (in m s−1), ERAi–Buoy, indicate overestimation of ERAi compared to the buoy data. Positive flux differences (in W m−2) indicate larger ocean heat gain (shortwave and net fluxes) in the ORA-IP products and negative differences indicate larger ocean heat losses (longwave, latent and sensible fluxes) in ORA-IP compared to the buoy values

All other bulk variables, including SST, air temperature and relative humidity (not shown) show positive biases (i.e., overestimation compared to the Stratus buoy observations) during 2001–2009. Mean offsets for SSTs are between 0.7 and 1.3 °C, but these differences only weakly correlate (<0.2) with latent flux biases in the ORA-IP ensemble, indicating that SSTs make only a small contribution to the differences in latent and net heat fluxes in the ocean reanalysis estimates.

Coincident with the variability discussed above, Weller (2015) found that increases in Stratus buoy wind speeds and latent heat fluxes over the period 2001–2009 (primarily in spring and fall) were accompanied by a decrease in net heat flux of 39 W m−2 or 104 % of the 9-year mean. A more extensive comparison with the WHOI Stratus buoy is now underway, trying to identify interannual signals more clearly, including looking at linear trends in ORA-IP and other gridded surface flux products over 2001–2009 for comparison with the buoy observations.

Summary and discussion

This paper looks at the surface heat fluxes from ocean reanalyses taking part in the GODAE/CLIVAR-GSOP Intercomparison Project ORA-IP. The emphasis is on the degree of agreement between products and, therefore, ensemble results are shown for means and spreads over the period 1993–2009, along with signal to noise ratio results to assess flux variability. Comparisons with available surface heat flux datasets, including local buoy measurements, are also made to highlight differences in input data and methodology.

Global mean heat fluxes (Fig. 2) generally show small net positive bias (i.e., heat flux into the ocean) compared with larger biases in observational based products (e.g., NOC2.0 and ISCCP/OAFlux). For products where data are available, the positive flux bias is largely compensated by the assimilation increments removing heat globally. Total heat gains (surface flux + assimilation increments) are typically 1–2 W m−2, much smaller than most atmospheric reanalyses, but still larger than is realistic from Earth’s energy budget considerations (Loeb et al. 2012). We also show that compensation between resolved turbulent (latent and sensible) heat fluxes and assimilation heat loss largely takes place in the upper 100 m (Fig. 3).

The global compensation between surface heat fluxes and assimilation increments in near surface layers suggests that assimilation increments do contain information about surface heat flux errors that might be used in a similar way to Stammer et al. (2004), who use 4DVAR assimilation to generate surface heat flux corrections. Indeed, the GECCO2 product shown here already contains such corrections leading to a nearly closed global budget in Fig. 2. However, using the assimilation increments regionally and in time for flux correction would present many challenges. The increments are also correcting for errors in both horizontal and vertical heat transports (mixing errors near the surface), and will also depend on the distribution of available data to assimilate. An example of such transport errors shows up at the Equator in several products. In general the implied meridional heat transports are brought into much closer agreement with independent observations by combining the surface fluxes and assimilation increments (Fig. 5b); however, large anomalous transports around the equator in some products can be seen where assimilation is unable to correct thermocline depths, suggesting that better bias corrections, e.g., Bell et al. (2004), may be needed.

Variability amongst ensemble members exhibit anti-correlations between short and long wave flux components, and between shortwave and latent heat flux components, consistent with these terms dominating local heat budgets nearly everywhere (i.e., ocean heat flux divergence variations are smaller). The ensemble mean seasonal cycle is highly consistent between members, with most areas showing monthly noise spread <10 W m−2 (Fig. 6). However, the interannual signal/noise ratios in net surface fluxes are generally less than one in most areas, except in the tropical Pacific, where ENSO introduces large interannual signals which are captured well in the ensemble, Fig. 7e, f. We also show consistency in surface flux anomalies between ORA-IP and satellite radiation combined with OAFlux turbulent fluxes during two other interannual events, a large negative PDO in the Pacific in 2008, and a persistent negative NAO period in the Atlantic in the later part of 2009 (Figs. 9, 10).

The reanalysis products are also compared to tropical buoy fluxes for 2007–2009 (when all flux components from eight tropical buoys are available), and to a long-term (2001–2009) reference dataset of fluxes at the Stratus buoy in the eastern south Pacific. All the reanalysis products show underestimation of ocean heat gain at the tropical buoys (one-third smaller Qnet than observed) primarily due to latent heat flux errors and, to a lesser degree, short and longwave radiation errors (Fig. 11). At the Stratus buoy, the ORA-IP ensemble again underestimates ocean heat gain and the temporal variability of this bias is completely dominated by latent heat flux errors caused by differences in the surface winds imposed in the reanalysis models compared with those measured directly at the buoy (Fig. 14b). This is a strong indication that better surface winds are a likely prerequisite for better surface fluxes. Given the strong relationships between SST gradients and surface winds this suggests coupled reanalyses may provide improved results (see further below).

These reanalysis products were not designed with air–sea fluxes in mind and the assessment here is relying on the influence of upper ocean temperatures, which are being controlled through data assimilation, to positively influence the surface heat fluxes. Song and Yu (2013) uses a cage or pool-area heat budget analysis together with in situ surface and subsurface measurements to examine the consistency of nine flux climatologies (from atmospheric reanalyses or blended products) and to identify uncertainties, in the western Pacific warm pool region. Such diagnostics using in situ ocean data have some potential to validate surface flux products and remove flux biases, but can only be used in regions with small lateral exchanges (e.g., the Pacific warm pool), if these lateral exchanges are assumed to be unknown.

Other efforts to improve air–sea fluxes are documented in Trenberth et al. (2011), Mayer et al. (2014) or Liu et al. (2015), who use atmospheric reanalysis products to assess air–sea fluxes based on atmospheric transport divergences. This approach could also work in the ocean, since the ocean currents should be strongly geostrophically constrained by the assimilated Argo profile data, although the ability of the ocean to store heat means that, to 1st order, surface flux errors are more directly compensated by assimilation increments, as shown here in Fig. 3. The comparison of ocean heat transports is a future objective of the ORA-IP program and we would argue that a demonstrable ability to reproduce ocean heat and freshwater transports with smaller error bounds is a key requirement of a good ocean assimilation product since this is information that is not directly available from the upper ocean observations of temperature and salinity.

In future coupled atmosphere–ocean reanalyses using coupled data assimilation schemes should help improve air–sea fluxes within reanalysis products. Truly coupled approaches to reanalysis would allow critical observational data such as sea surface temperatures to be assimilated correctly, e.g., from the ESA CCI program, Merchant et al. (2014), including their error representations. In the current ORA-IP products there are several coupled reanalysis results, but lower resolution models are being used and SSTs are generally still being assimilated as statistically gridded surface products, which will not correctly represent the spatial and temporal information content available from the satellite observational fields.

Finally, we note that the availability of in situ flux datasets suitable for evaluating large scale long-term products such as from reanalyses is still very limited (Yu et al. 2013). The tropical buoys used here provide a suitable first step but more mid-latitude buoys, e.g., from OceanSITES, with longer time records are needed in order to assess the lower frequency variability in heat fluxes outside the tropics. Cross calibration between buoy deployments is needed to create these long record datasets, as provided for the Stratus buoy (Bob Weller personal communication), and the wider availability of such products would allow greater use by the modelling and satellite data processing communities.