Using satellite and reanalysis data to evaluate the representation of latent heating in extratropical cyclones in a climate model

Extratropical cyclones are a key feature of the weather in the extratropics, which climate models need to represent in order to provide reliable projections of future climate. Extratropical cyclones produce significant precipitation and the associated latent heat release can play a major role in their development. This study evaluates the ability of a climate model, HiGEM, to represent latent heating in extratropical cyclones. Remote sensing data is used to investigate the ability of both the climate model and ERA-Interim (ERAI) reanalysis to represent extratropical cyclone cloud features before latent heating itself is assessed. An offline radiance simulator, COSP, and the ISCCP and CloudSat datasets are used to evaluate comparable fields from HiGEM and ERAI. HiGEM is found to exhibit biases in the cloud structure of extratropical cyclones, with too much high cloud produced in the warm conveyor belt region compared to ISCCP. Significant latent heating occurs in this region, derived primarily from HiGEM’s convection scheme. ERAI is also found to exhibit biases in cloud structure, with more clouds at lower altitudes than those observed in ISCCP in the warm conveyor belt region. As a result, latent heat release in ERAI is concentrated at lower altitudes. CloudSat indicates that much precipitation may be produced at too low an altitude in both HiGEM and ERAI, particularly ERAI, and neither capture observed variability in precipitation intensity. The potential vorticity structure in composite extratropical cyclones in HiGEM and ERAI is also compared. A more pronounced tropopause ridge evolves in HiGEM on the leading edge of the composite as compared to ERAI. One future area of research to be addressed is what impact these biases in the representation of latent heating have on climate projections produced by HiGEM. The biases found in ERAI indicate caution is required when using reanalyses to study cloud features and precipitation processes in extratropical cyclones or using reanalysis to evaluate climate models’ ability to represent their structure.


3
It is well known that latent heating can play a significant role in the evolution of ETCs (e.g., Manabe 1956;Sanders and Gyakum 1980;Davis et al. 1993;Zhu and Newell 1994;Stoelinga 1996;Pomroy and Thorpe 2000) though this can vary substantially on a case-to-case basis (e.g., Kuo and Low-Nam 1990;Smith 2000;Wernli et al. 2002;Dacre and Gray 2009;Fink et al. 2012;Dearden et al. 2016). It has been shown that latent heating can significantly influence the evolution and deepening of some of the most damaging storms (e.g., Ulbrich et al. 2001;Liberato et al. 2011) and has been shown to influence frontal structure and propagation around ETCs (see Posselt and Martin 2004;Reeves and Lackmann 2004). Therefore the fidelity with which processes associated with latent heating in ETCs are simulated in climate models play an important role in their ability to represent the climate of the extratropics.
Latent heat release modifies the potential vorticity (PV) structure of ETCs. Viewing ETCs from a PV perspective can therefore be used to distil the impact of latent heating on ETCs, for example, through modifying the tropopause around ETCs (e.g., Grams et al. 2011;Joos and Wernli 2011;. Latent heat release generates (positive/ cyclonic) PV anomalies below and (negative/anticyclonic) above the location of heating (Hoskins et al. 1985;Wernli and Davies 1997). Cyclonic PV anomalies below the heating maxima can act to enhance the low level circulation (Plant et al. 2003) and strengthen flow in and downstream of the warm conveyor belt (WCB, Grams et al. 2011), a band of ascending moist air responsible for much precipitation in ETCs. Anticyclonic PV generation above the heating maxima can help to slow the movement of the upper level PV maxima (the trough), thus maintaining the vertical tilt of the ETC (Stoelinga 1996), and can also enhance downstream ridging (Davis et al. 1993). Capturing both the location and magnitude of diabatic processes is therefore necessary to realistically reproduce the evolution of the flow around ETCs and their consequent interaction with the background environment and flow (Pomroy and Thorpe 2000;Massacand et al. 2001;Dirren et al. 2003;Methven 2013).
This study uses an objective feature tracking technique to study composite ETCs. The ability of a state-of-the-art climate model, HiGEM (Shaffrey et al. 2009), to reproduce latent heating in ETCs is assessed using data from the ERA-Interim (ERAI) renalysis (Dee et al. 2011;Simmons et al. 2007) and the ISCCP Schiffer 1991, 1999) and CloudSat (Stephens et al. 2002) datasets. This framework also allows an assessment of ETC structure in ERAI relative to the observational datasets. Whilst it is not possible to directly evaluate processes associated with latent heating using remotely retrieved fields, variables such as cloud amount, cloud height, optical depth and reflectivity allow a comprehensive evaluation of structural characteristics associated with latent heating in ETCs. Having established uncertainty in the fidelity with which these fields are reproduced in both model and reanalysis, latent heating and associated processes in the model are then directly compared to reanalysis. For the first time this study combines both remote sensing data (see e.g., Field et al. 2008Field et al. , 2011Govekar et al. 2014;Booth et al. 2013) and reanalysis (see e.g., Bengtsson et al. 2006Bengtsson et al. , 2009Catto et al. 2010) to evaluate a climate model within a composite cyclone centred framework, allowing both the structure of cloud features to be evaluated and the underlying processes within the model which are responsible for any errors in the representation of those features to be investigated. The study focusses on winter (DJF), northern hemisphere ETCs since this is the season when the stormtracks and ETCs in the Atlantic and Pacific are at their most intense.
In Sect. 2 the climate model, reanalysis and remote sensing datasets, tracking algorithm and offline simulator, COSP, are described. In Sect. 3, the climate model is evaluated against the ISCCP and CloudSat datasets before the latent heating tendencies and PV structures of composite ETCs in HiGEM and ERAI are compared. The findings and limitations of the study are discussed in Sect. 4.

Data and methods
In order to assess the performance of the model, suitable observations are required as a benchmark. Reanalyses are often employed in this context. Reanalysis assimilates observations that help to directly constrain pressure, temperature, winds and to some extent humidity, which then also helps to indirectly constrain the cloud and precipitation associated with meteorological structures. However, the reanalysis does not constrain the cloud condensate/ phase and precipitation amounts and vertical structure directly. The condensate and precipitation amounts and phase (liquid/ice) and associated latent heating are therefore very dependent on the specific characteristics of the forecast model used within the data assimilation scheme in the reanalysis. Remote sensing data, where available, provides an opportunity to confront this challenge and is combined with reanalysis in this study to comprehensively evaluate HiGEM's ability to represent cloud structure and diabatic processes in ETCs.

HiGEM climate model
The model evaluated in this study is HiGEM1.2 (Shaffrey et al. 2009), which is part of the UK Met Office Hadley Centre Global Environmental Model (HadGEM) family of models and is based on the HadGEM1 model. HiGEM is a 1 3 higher resolution version of HadGEM1, in both atmosphere and ocean, which has been amended in certain respects. The horizontal resolution is 0.83 • × 1.25 • (N144) for the atmosphere and 1/3 • × 1/3 • globally for the ocean and sea ice. The atmosphere has 38 vertical levels with a model top at 39 km. Detailed description of the model formulation can be found in Shaffrey et al. (2009, Martin et al. (2006 and references therein. HiGEM has previously been shown to be capable of capturing the large-scale features of composite ETCs when compared to reanalysis (ERA-40, Uppala et al. 2005) though with differences in the finer scale details of the structure of the warm conveyor belt (Catto et al. 2010). HiGEM has also been shown to reproduce estimates of precipitation around composite ETCs that are broadly comparable to the ERAI reanalysis and GPCP daily dataset (Hawcroft et al. 2015), indicating that the total column integrated latent heating in the model is in agreement with observations. However, evaluating precipitation does not provide any information about whether the vertical distribution of latent heating, which is important for ETC evolution, is well represented.
In this study, the model has been integrated for 5 DJF seasons, each initialised from consecutive years of a preexisting integration of the model forced with present-day radiative forcings. The computational requirements of storing and processing the data required for this study, in particular the use of the forward simulator, prevent using a longer period. Decadal variability in the precipitation produced by the model has previously been shown to be small (Hawcroft et al. 2015). The results of the study have been compared to ETC composites using single seasons of data and do not materially change due to the high number of samples in each season (over 350 individual ETCs), further suggesting sampling over a 5 year period is sufficient to eliminate the effects of internal variability.

ERAI reanalysis
The ERA-Interim (ERAI) reanalysis (Dee et al. 2011;Simmons et al. 2007) uses a 4D-Var data assimilation system to incorporate observations over a 12-h analysis period, with forecasts commencing at 00:00UTC and 12:00UTC, and has a spectral model resolution of T255 (approximately equivalent to 80 km/0.7 • in the mid-latitudes). ERAI has 60 model levels, with an atmospheric top at 0.1 hPa. ERAI has been shown to perform well compared to other high resolution reanalysis products in an evaluation of several metrics of Northern Hemisphere ETCs (Hodges et al. 2011).
The ERAI reanalysis does not provide as standard many of the variables which are required for this study, such as tendencies of latent heating and several of the inputs required for the COSP simulator. To overcome this, short forecasts using the ERAI model were initialised for 5 DJF periods. These forecasts were run from existing ERAI analysis starting at 00:00UTC and 12:00UTC each day. These twice daily forecasts are then combined to provide complete seasons of data. Fluxes which are computed from the forecasts as accumulations suffer from 'spin-up' from the initial state. Previous work analysing precipitation (e.g., Kobold and Sušelj 2005;Simmons et al. 2010;Kållberg 2011;Hawcroft et al. 2012;de Leeuw et al. 2014;Hawcroft et al. 2015) has shown that lead times between 12-24 h are less affected by this problem. Therefore data at 3-hourly intervals extracted from the period 12-24 h in each successive forecast are combined to create each season of data (see Hawcroft et al. 2015, Figure 11). The years selected for this study are also chosen to overlap with the availability of the two remote sensing datasets, ISCCP (available until December 2009) and CloudSat (available from June 2006). The seasons DJF 2006DJF -2007DJF , 2007DJF -2008DJF , 2008DJF -2009DJF , 2010DJF -2011DJF and 2012DJF -2013 are used such that the majority of the ETCs in the ERAI/ISCCP/CloudSat data used in this study are the same, with maximal overlap between ERAI and CloudSat.

ISCCP
The International Satellite Cloud Climatology Project dataset (ISCCP, Schiffer 1991, 1999) uses infrared and visible radiances from a number of geostationary and polar orbiting satellites (e.g., Schiffer and Rossow 1985;Zhang et al. 1995;Rossow and Zhang 1995). The D1 dataset, employed here, is available from July 1983-December 2009, is 3-hourly, global and has 280-km grid cells (equivalent to 2.5 • × 2.5 • at the equator). The data is based on brightness temperature estimates from up to five geostationary and two polar orbiting satellites. Infrared (IR) radiances provide estimates at all times, with visible/near-IR (VIS-IR) adjusted brightness temperatures (T b ) available during daylight hours. The latter are more reliable since the IR-only analysis detects low cloud less accurately, particularly where the cloud is broken, and where clouds are optically thin (Rossow and Schiffer 1999).
In this study, VIS-IR adjusted fields are employed during daylight hours only for the years DJF 2004DJF -2005DJF to 2008DJF -2009. ISCCP estimates of cloud fraction, cloud top pressure (ctp) and optical thickness (τ) are used to evaluate radiance equivalent fields derived from the HiGEM and ERAI datasets using the COSP simulator (discussed in Sect. 2.4, below).

CloudSat
CloudSat is a polar orbiting satellite flying in the A-Train constellation (Stephens et al. 2002) with a 94-GHz nadirlooking Cloud Profiling Radar (CPR) which measures the power backscattered by hydrometeors as a function of distance from the radar . The CPR has a cross track width of 1.4 km, along track resolution of 1.7 km, vertical resolution of 240 m and minimum detectable reflectivity of ∼−30 dBZ (Tanelli et al. 2008). Given the ability of the CPR to retrieve the vertical structure of clouds, it is a potentially powerful tool for the evaluation of extratropical cyclones (Posselt et al. 2008;Naud et al. 2012Naud et al. , 2015Crespo and Posselt 2016). However, the narrow cross track resolution and orbiting nature of the satellite means sampling is low, with fewer than 230 overpasses sampled at each composite gridbox across the 5 DJF seasons sampled here, and is further discussed in Sect. 3.2. The seasons sampled here are 2006-2007, 2007-2008, 2008-2009, 2010-2011 and 2012-2013, though for the final year, data was only available during daylight hours, reducing sampling. Within each composite gridbox, many individual samples may be averaged from a single overpass and these are generally strongly correlated (see Marchand 2012). This sampling is from a total of ∼8000 Northern Hemisphere satellite overpasses and 1768 ETCs. In this work, ETCs are sampled from north of 30 • N where Cloud-Sat spends ∼1/3 of its time.
Cyclone track data is taken from ERAI and is available with a temporal spacing of 3 h. As such, CloudSat data from a window ±90 min of each 3-hourly ERAI data time is assigned to that time. This short window reduces sampling but also prevents the composites being smeared by the propagation of cyclones either side of the compositing time. Tests performed with a longer sampling window (not shown) did not materially improve the structure of composite ETCs, but did smear their features. Sampling is further discussed in Sect. 3.2. In this study, the 2B-GEOPROF dataset is used ) and the CloudSat cloud mask confidence threshold of 20 (unitless) is applied to remove false detections (see Marchand et al. 2008). Data within 1.2 km of the surface is also not included due to contamination by clutter ).

COSP
The Cloud Feedback Model Intercomparison Project Observation Simulator Package (COSP, Bodas-Salcedo et al. 2011) ingests a number of model fields relating to cloud and precipitation properties and uses these to simulate retrievals of satellite instruments. The COSP simulator incorporates several individual instrument simulators, including ISCCP and Cloudsat.
The ISCCP (known as ICARUS) simulator (Klein and Jakob 1999;Webb et al. 2001;Bodas-Salcedo et al. 2011) mimics the assumptions of the ISCCP retrieval algorithm and is intended to provide ISCCP-like retrievals of a column to evaluate model output against retrievals from ISCCP itself. Many studies have used the simulator to evaluate global cloud amounts and types in climate models (e.g., Zhang et al. 2005;Williams and Tselioudis 2007;Williams and Webb 2009;Kay et al. 2012;Lauer and Hamilton 2013;Klein et al. 2013). ISCCP uses an assumption that clouds are plane-parallel and homogeneous in each pixel. In the simulator, 20 sub-grid columns are used within each gridbox. This assumption is also used in the simulator to estimate optical depth in each column. The effect of this sub-pixel/sub-gridbox variability on optical depth estimates is always negative and may be significant in optically thick clouds. Pincus et al. (2012) note that in addition to these issues, where cloud is broken, sub-pixel artefacts associated with 3-dimensional radiative effects may also become important and would not be captured by the simulator. The simulator is used in VIS-IR adjusted mode to replicate the ISCCP-D1 VIS-IR data. The simulator assumes a uniform distribution of condensate across sub-columns, which is consistent with the assumptions in both HiGEM and ERAI, but may also impact on the comparability of gridbox mean values to observations, particularly in convectively active regions (see Posselt et al. 2012).
The CloudSat (known as Quickbeam) simulator (Haynes et al. 2007;Bodas-Salcedo et al. 2011) provides the ability to directly compare equivalent fields from CloudSat and reanalysis/forecast/climate models (e.g., Bodas-Salcedo et al. 2008;Greenwald et al. 2010;Naud et al. 2010;Govekar et al. 2011;Field et al. 2011;Jiang et al. 2012;Franklin et al. 2013;Govekar et al. 2014;Booth et al. 2013). The simulator works by distributing model hydrometeors amongst the sub-grid columns with radar reflectivity, including attenuation, calculated at each model level. The CloudSat radar is only capable of detecting returns above −30 dBZ (Tanelli et al. 2008) and when this sensitivity is combined with the effective vertical range of 240 m it means that some optically or geometrically thin clouds are not detected (Stephens et al. 2002;Marchand et al. 2008). The radar limitations therefore need to be taken into account when comparing model output to Cloud-Sat retrievals (Marchand et al. 2009). In addition there are sensitivities associated with the formulation of the simulator that must be considered when assessing uncertainty in simulated reflectivities. In particular, the particle size distributions (PSDs) and fallspeed parameterisations used in the climate and reanalysis models should be reflected by the assumptions used in the simulator. To the extent that these assumptions cannot or should not be reflected in the simulator, the use of alternative assumptions requires 1 3 justification. In either scenario, the sensitivity of the results to the microphysical assumptions which are specified needs to be taken into account (see Bodas-Salcedo et al. 2008;Di Michele et al. 2012;Johnston et al. 2012). These issues are further discussed in Sect. 3.2.

Tracking and compositing methodology
The tracking algorithm used in this study (e.g., Hodges 1994Hodges , 1995Hodges , 1999, has previously been applied in studies of ETCs (e.g., Bengtsson et al. 2007Bengtsson et al. , 2009Catto et al. 2010;Dacre et al. 2012;Hawcroft et al. 2012;Zappa et al. 2014;Hawcroft et al. 2015). Northern Hemisphere ETCs are identified as features exceeding 1 × 10 −5 s −1 in the 850 hPa relative vorticity field, truncated to T42, in 3-hourly data and combined into cyclone tracks. Tracks identified north of 30 • N, with a lifetime of at least 2 days and which travel at least 1000 km are retained as ETCs and included in the analysis. Sensitivity of the results in this study to these thresholds has been tested by relaxing the time and distance criteria. Our conclusions were insensitive to changing these thresholds.
The compositing methodology has also previously been applied in studies of ETCs (e.g., Bengtsson et al. 2007Bengtsson et al. , 2009Catto et al. 2010;Hawcroft et al. 2015). It involves several steps and is discussed in detail in Catto et al. (2010). To summarize the procedure, the tracks are first identified and selected. The field being composited (e.g., precipitation, potential temperature) is then extracted on a radial grid at each of the identified cyclone points and is rotated such that the direction of travel for all ETCs is aligned. A time within each track's lifecycle may then be chosen for compositing, such as the time of maximum 850hPa relative vorticity or maximum precipitation. The extracted field, for each ETC at the relevant point in the lifecycle, is then averaged across all ETCs to create a composite.
Tracks are identified in 3-hourly 850 hPa vorticity fields in HiGEM and ERAI. Tracks of the ETCs used for the ISCCP and CloudSat data are identified in the analysed ERAI field. For ERAI, the ETCs are identified in the forecast field (see Sect. 2.2) such that the ETC location is consistent with the data being composited, since the analysed and forecast location may differ.
The ISCCP VIS-IR data is only available during daylight hours, reducing the sample size. The Cloudsat data is only available where the satellite overpass coincides with the presence of an ETC, leading to very low sampling per gridpoint in each composite. Composite ETCs in Sects. 3.1 and 3.2 are therefore produced by averaging 5 adjacent 3-hourly data times during the composite storm lifecycles, such that the composites contain data from times at −6, −3, 0, +3 and +6 h from the designated point in the ETC lifecycle. This provides approximately 500-2500 samples at each gridpoint within the 5-step composites in ISCCP, depending on the stage in the lifecycle being analysed, from a total of 1768, 1772 and 1943 individual ETCs in ISCCP, ERAI and HiGEM, respectively. Sampling for Cloudsat, as noted in Sect. 2.3.2 and discussed further in Sect. 3.2, is far lower. The broad spatial patterns of ETC activity in HiGEM are comparable to ERAI (see Hawcroft et al. 2015, Figure 1). The performance of HiGEM in this respect compares favourably relative to the CMIP5 models (see Zappa et al. 2013).
Composites shown in this study are shown at the time 12-h prior to maximum dynamical intensity (unless otherwise specified), which is typically when precipitation-and associated latent heating-is at its peak intensity. Relative vorticity at 850 hPa is used as the metric of dynamical intensity for these purposes. Figure 1 shows the evolution and decay of relative vorticity and precipitation in the ETCs from HiGEM and ERAI used in this analysis. It can be seen that the precipitation peaks and begins to decay prior to the maximum dynamical intensity (as was also shown in Bengtsson et al. (2009)). To orientate the reader for the remainder of this study, Fig. 2 shows composites of fields in HiGEM at 12-h prior to maximum dynamical intensity. The direction of propagation of the ETCs is shown in Fig. 2all ETCs are rotated in this study, so the direction of propagation is the same in all figures. The structure of the features shown in Fig. 2 are qualitatively similar in HiGEM and ERAI, so only HiGEM is shown here for convenience. Significant precipitation is produced where the warm conveyor belt (WCB) reaches the warm front (Fig. 2a). Vertical velocities are also at their maximum ( Fig. 2b) in this area of the composite as the large scale flow along the WCB is forced upwards at the front. This is the area where latent heat release is at its maximum and is therefore a key region of interest for this study. In Fig. 2c low level winds can be seen, with particularly strong flow on the southerly side of the composite associated with the WCB and the cold front. The frontal nature of the composite ETC is further observed in Fig. 2d where equivalent potential temperature (θ e ) contours are displayed-the cold front trails along the southern flank of the composite and the warm front is situated on the leading edge of the composite, coincident with the location of peak precipitation (a) and vertical velocities (b). Statistical analysis of certain fields in this study are included for the whole composite, the region of maximum ascent (enclosed by solid black lines in (b) and (d)) and the WCB inflow region (enclosed by dashed lines in (b) and (d)). Animations which show the lifecycle of several key fields of composite ETCs as they evolve and decay are provided in the Supplementary Material to this paper, as referred to in the text and figure captions.

ISCCP composite storm structure
In Fig. 3a-c composite cloud fraction is shown for ISCCP, HiGEM and ERAI. Confidence intervals (at 95 %) at each gridpoint in these composites are less than ±3 % for ISCCP and ±2 % in HiGEM and ERAI. This provides an assessment of the ability of HiGEM and ERAI to produce the position and magnitude of cloud fields around composite ETCs. It is clear that the cloud fraction in ISCCP is far greater than for HiGEM and ERAI, indicating that ISCCP observes more cloud around ETCs than in either HiGEM or ERAI. Other studies (see Zhang et al. 2005;Kay et al. 2012, and references therein) have shown that climate models typically underestimate total cloud amounts globally and this has been shown to extend to biases around ETCs and fronts (e.g., Naud et al. 2010Naud et al. , 2014. All three datasets have peak cloud fractions in a region which is centred on the peak precipitation and maximum ascent (as shown in Fig. 2a, b for HiGEM), where the WCB ascends over the surface warm front.
In Fig. 3d-f composite cloud top pressure (ctp) in ISCCP, HiGEM and ERAI is also shown. The gridpoint confidence intervals (at 95 %) are less than ±20 hPa in ISCCP and ±8 hPa in HiGEM and ERAI. In these composites, the values are derived as being the average ctp from gridpoints where cloud occurs. One striking difference is the high cloud bias (lower ctp) in HiGEM throughout much of the composite, particularly on the southern flank of the composite where the trailing cold front (and WCB) is situated. Around the region of maximum ascent/ precipitation (see Fig. 2), the ctp in HiGEM and ISCCP is more closely comparable. In ERAI, the ctp is lower than HiGEM throughout the composite and the broad structure of the cloud features are generally in better agreement with ISCCP. ERAI does not reproduce the highest cloud tops that are observed in the region of maximum ascent and precipitation (see Fig. 2), where the surface warm front is situated, in ISCCP. The differences in ctp in ERAI and HiGEM are consistent with the results of Catto et al. (2010). Catto et al. (2010) used relative humidity in ERA-40 and HiGEM as a proxy for clouds, finding that in HiGEM the relative humidity field extended to higher altitudes than ERA-40, particularly further into the southern sector of the composite, as seen here.
The third variable of interest from ISCCP/COSP is optical depth (τ) which provides additional detail on the structure of the clouds in each dataset by providing information on their thickness. As for ctp, the values are derived as being the average τ from gridpoints where cloud occurs. In Fig. 3g-i, the τ in both HiGEM and ERAI is greater than Unless otherwise denoted, data included in the composites in this paper is taken from 12-h prior to maximum dynamical intensity, which is measured by relative vorticity at 850 hPa 1 3 in ISCCP throughout the south-eastern quadrant where the warm sector is located. At the warm front, τ exceeds 50 in ERAI and 35 in HiGEM at this point in the lifecycle. Given the underestimate in cloud fraction, it is not surprising to see an overestimate in τ, since this is again a typical global bias of climate models (see Zhang et al. 2005).
In the Supplementary Material ( Figure S1), an animation of the fields in Fig. 3 throughout the composite lifecycle is provided, showing that the differences highlighted at 12-h prior to maximum intensity are typical of the entire lifecycle of the composites.
In Fig. 4 joint histograms of cloud top pressure and optical depth are shown for HiGEM, ERAI and ISCCP. The statistics are derived from each of the ETCs in the composites and aggregated, rather than extracted from the composites themselves. In Fig. 4a-c the relationship from all points in the composites is shown. It is striking that in HiGEM there are many optically thick clouds with cloud tops around 250 hPa and the lack of variability in the height of these clouds when compared to both ERAI and, particularly, ISCCP. This lack of variability would be consistent with cloud frequently produced by the convection scheme in HiGEM-a hypothesis tendered by Catto et al. (2010) to explain the deeper relative humidity field seen in that study when HiGEM was compared to ERA-40. In ERAI, the distribution of cloud top heights is more varied. There are too many optically thick clouds, though of interest is the cluster of cloud tops below 700 hPa. Around the region of maximum ascent (d-f), where much precipitation and latent heat release is concentrated, neither HiGEM nor ERAI reproduces the variability in cloud top pressure or optical depth in ISCCP, where optically thick (>10) high topped Fig. 2 Composite ETCs from HiGEM at 12-h prior to maximum dynamical intensity for a precipitation (mm h −1 ), b vertical velocity at 700 hPa (hPa h −1 ), c system relative winds at 950 hPa (m s −1 , contours and vectors) and d equivalent potential temperature at 850 hPa (K). In b, d the location of the maximum ascent (solid black line) and WCB inflow (dashed black line) regions are denoted. The direction of propagation is denoted and is the same for all composites in this study. Plot radii are 20 • 1 3 1 3 (<200 hPa) clouds are observed. Both HiGEM and ERAI frequently produce optically thick clouds around the same levels (∼300 and ∼400 hPa, respectively). Finally, when considering the WCB inflow region (g-i), ERAI is seen to produce a bimodal distribution, with cloud tops clustered   around 300 and 850 hPa. HiGEM, again, produces many clouds of similar cloud top pressure (∼300 hPa).
To understand the factors contributing to the biases (and differences) in HiGEM and ERAI, it is useful to consider some of the key variables which govern ctp and τ. Cloud ice and liquid are particularly important and in Fig. 5 crosssections of cloud liquid water (q cl ) and ice (q cf ) are shown. It should be noted that in HiGEM and ERAI the representation of crystals and aggregates for radiative purposes differs, with snow not being radiatively active in ERAI. As such, the effective radii that are used in the temperature ranges relevant to these composites (230-245 K) differ (see Edwards et al. (2007); Yang and Liou (1996) for HiGEM and ECMWF (2006); Ebert and Curry (1992); Heymsfield and Platt (1984) for ERAI). This inconsistency increases/ decreases the τ ice in HiGEM/ERAI relative to each other.
It can be seen in Fig. 5 that the peak τ in HiGEM is related to the q cf maxima and in ERAI is related to the q cl maxima. In ERAI, q cl values are much greater at lower levels (peak values are 89 and 47 × 10 −6 kg kg −1 , respectively, in the cross-sections for ERAI and HiGEM) with significantly less q cf . This suggests that the cloud parameterisations in the two datasets function differently. Cloud phase partitioning is treated differently in ERAI and HiGEM, with a diagnostic temperature-dependent mixed phase in ERAI and variable phase partition based on physical processes in HiGEM. This could be one reason for at least some of the differences in ice/liquid observed here. Alternatively, and as noted above, Catto et al. (2010) hypothesised that this difference may be associated with the convection scheme in HiGEM being too active, leading to the vertical redistribution of moisture to higher levels. The spatial extent of the cloud fields is consistent with the convection scheme being more active in HiGEM. Similarly, biases in convective parameterisations were invoked in Webb et al. (2001) and Field et al. (2008) to explain the behaviour of other models in the mid-latitudes. This issue is further discussed in Sect. 3.3. In ERAI, low level q cl is greater than HiGEM throughout the inflow region and can be related to the bias in optical depth and cloud top pressure seen in Fig. 4h around 850 hPa. A low level (excessive) climatological q cl bias has been noted by Li et al. (in revision) in ERAI across the storm track regions, consistent with the results here. Mace et al. (2011) have shown that the ISCCP simulator performs well at simulating ctp but that simulations of τ are less reliable, with τ estimates from ISCCP in their study being on average 10 % lower than those simulated from ground based retrievals and the fraction of optically thick cloud (τ ≥ 23) being 25 % lower in ISCCP than the simulated estimates. Their results are consistent with the differences seen in these composites, with lower τ in ISCCP throughout the composites when compared to estimates produced from ERAI and HiGEM data, though given the uncertainties, these results should be interpreted with caution. Where cloud is broken, sub-pixel variability may also become important ). Based on typical subpixel τ, Mace et al. estimate that for an actual τ of 60, the bias introduced by using pixel-mean radiances (as in the ISCCP simulator) could be of the order of 15 % of the actual value. Even if such a bias is assumed, the composite τ values for both HiGEM and ERAI remain far greater than those retrieved by ISCCP and the cloud top pressure/τ distributions are not comparable to ISCCP. It is not possible to quantitatively decompose the extent to which these differences might be attributable to the models producing cloud that is too thick and/or the ISCCP simulator causing this apparent bias.
Using ISCCP and the COSP simulator to evaluate HiGEM and ERAI provides a tool to directly compare observational data against the model/renalysis output. Some uncertainties, particularly in the simulated τ field, remain, though this analysis clearly shows biases in HiGEM and ERAI relative to ISCCP. In HiGEM, high topped, optically thick clouds dominate the composite and little variability is observed in the cloud top pressures. In ERAI, cloud tops are lower and exhibit greater variability than HiGEM though do not compare closely to ISCCP.

CloudSat composite storm structure
There is some uncertainty in the magnitude of the differences in composite τ values seen between HiGEM/ERAI and ISCCP. Profiles of reflectivity derived from the CloudSat simulator are useful to further evaluate the cloud structures. As noted in Sect. 2.3.2, CloudSat's sampling is low, though for HiGEM and ERAI, the COSP simulator produces full 3D reflectivity fields, such that sampling is not restricted. Data is sampled from retrievals greater than −30 dBZ in all datasets to take into account CloudSat's minimum detectable reflectivity of −30 dBZ (Tanelli et al. 2008).
In the simulated reflectivities from HiGEM and ERAI, the results are sensitive to the microphysical assumptions which are specified in COSP and the distribution   (Bodas-Salcedo et al. 2008;Franklin et al. 2013). As such, the key conclusions of this evaluation are unlikely to be influenced by the microphysical assumptions used here in COSP, though we caution against interpretating the mean reflectivity in absolute terms and would emphasise a focus on the gross structural differences in the distribution of hydrometeors that can be inferred from this analysis. In Figs. 6 and 7 composite mean reflectivity (dBZ) at ≈1.7 and ≈5.5 km, respectively, is shown for HiGEM, ERAI and Cloudsat. The data, as for ISCCP, is averaged over the 5 timesteps adjacent to the time of compositing. The CloudSat composites are noisier due to the lower sampling, although the broad structure of the WCB can be observed. The highest frequency of individual CloudSat overpasses which detect hydrometeors (Figs. 6c, 7c) corresponds to the location of highest cloud fraction in the ISCCP composite in Fig. 3a. Rain/snow mixing ratios in HiGEM and ERAI are at their highest in the WCB region at 1.7/5.5 km (not shown), leading to the band of higher reflectivity observed in these composites. Both HiGEM and ERAI underestimate reflectivity at 5.5 km when compared to CloudSat. This may be a function of precipitation being produced at too low an altitude or a lack of larger hydrometeors being produced where there is precipitation. Bodas-Salcedo et al. (2008) have shown (in their Figure 4) that an earlier version of the Met Office model produced limited variability in reflectivity across the North Atlantic due to precipitation being produced by the convection scheme even though the mean precipitation was well estimated.
In Fig. 6 it can be seen that the magnitude of the in-cloud mean reflectivities ( dBZ N in ) and the mean reflectivity from all points ( dBZ N all ) in HiGEM and ERAI at 1.7 km are broadly comparable to CloudSat, albeit with some differences in the structure of the band of higher (>0 dBZ) reflectivities. In Fig. 6d-f cloud fraction is shown. This is a different metric to that of ISCCP, which provides a two-dimensional measure of cloud cover at each point and the instruments also have differing sensitivities-for example, the ISCCP cloud fraction could be underestimated whilst the CloudSat fraction at each level could be near observations if there was too much cloud overlap in the models relative to observations. HiGEM represents low level cloud fraction, both the absolute fraction In ERAI, there is slightly too much cloud. In both HiGEM and ERAI, the location of the highest cloud fraction (>65 %), is slightly towards the northern side of the composite, whereas the location of highest reflectivity is further south. In CloudSat, the two maxima are coincident-around the point where the warm front is forcing large-scale ascent. In Fig. 7d-f, the cloud fractions are much lower in HiGEM compared to CloudSat. In ERAI, they remain comparable to CloudSat, though the mean reflectivity in ERAI is much lower. Given the differences in cloud fraction seen in Fig. 3a-c, where both HiGEM and ERAI had lower cloud fraction estimates compared to ISCCP, these results may suggest that both the model and reanalysis have limitations in simulating cloud processes-they may be producing too few optically thin, non-precipitating clouds or have too much vertical cloud overlap (resulting in the underestimate compared to ISCCP), though produce a better two-dimensional distribution of precipitating clouds (leading to the better comparison of lowlevel mean reflectivity to CloudSat) which produce precipitation at too low a level or fail to produce the most intense, deep precipitating events which may dominate the mean reflectivity at upper levels (the underestimates of reflectivity at 5.5 km).  Figure 8 shows the vertical structure of reflectivities in the WCB, with ERAI exhibiting large biases in the distribution of reflectivities greater than 0 dBZ relative to the observations. The HiGEM reflectivity structure indicates that though the magnitude of reflectivities is underestimated at higher altitudes, it is more able to capture the deep cloud structure at the ETC centre, reaching similar heights as CloudSat (Fig. 8a, c). This can be reconciled with the simulated cloud top pressure seen in Fig. 3d-f. The inability of ERAI to reproduce the vertical extent of higher mean reflectivities in the WCB also corresponds with the cloud top pressure biases in Fig. 3d-f. The vertical extent of the values of >0 dBZ in CloudSat can be reconciled with Fig. 4f, where neither HiGEM nor ERAI reproduces the very highest clouds. These clouds/hydrometeors may be indicative of bands of deep, heavily precipitating convection in observations (a small sample of which could dominate the mean reflectivity). The cloud fraction contours in Fig. 8d-f show that HiGEM produces higher mean reflectivity from fewer clouds when compared to ERAI-which may indicate more precipitation being produced by the convection scheme in HiGEM than ERAI, though still not capturing the observed distributions. It has previously been shown that the ERAI cloud scheme fails to represent ice water content distribution in the −20 to 0 • C range, which can be reconciled with the underestimate of reflectivity in ERAI at upper levels (see Delanoë et al. 2011) It is clear from the transects in Fig. 8 that in the Cloud-Sat composite ETC, significant precipitation can occur both north and south of the axis of propagation, whilst precipitation in both HiGEM and ERAI mostly occurs towards the south, particularly in ERAI-an important finding in the context of the influence of latent heating in ETCs. Figure 5 has already shown that the optical depth biases in ERAI are associated with low altitude cloud liquid water and the reflectivity structure in Fig. 6b, e is related to the location of rain at this level (not shown). In HiGEM, the highest reflectivity values at both 1.7 and 5.5 km are primarily associated with snow (not shown). Figure 9 shows the fraction of pixels in the ETCs that exceed −20/10/0 dBZ. As noted above, there is some uncertainty in the absolute values of simulated reflectivity, though one can consider the fraction above −20 dBZ to broadly represent cloud occurrence and exceedence of the higher thresholds to represent light/heavy precipitation. As in Fig. 4, the statistics are derived from each of the ETCs in the composites and aggregated. In the in-cloud composites (a-c), at lower levels, both ERAI and HiGEM produce frequencies of reflectivities greater than −20 dBZ that are comparable to CloudSat, though with some underestimates of the frequencies greater than −10/0 dBZ. At higher levels (greater than ~3 km), neither HiGEM nor ERAI are able to capture the frequency of occurrence for any of these thresholds, though HiGEM produces a greater frequency of occurrence for all thresholds above 1.75 km than ERAI. In the all points composites (d-f), similar biases are found, with a failure to reproduce the CloudSat reflectivity distributions at higher altitudes. In ERAI, it is notable that the frequency which reflectivities greater than −20/10/0 dBZ is greater than HiGEM below 1.75 km-indicating there is more low altitude, precipitating cloud in ERAI than HiGEM. CloudSat observations are not available near the surface, though these differences can be related to those seen in Figs. 3 and 4, particularly with respect to the WCB in-flow region where cloud liquid water (Fig. 5c, d) is higher in ERAI and the frequency of >0 dBZ (Fig. 9c, f) is also higher than HiGEM. In the region of heaviest precipitation at the location of maximum ascent (Fig. 9b, d), the frequency of greater than −10/0 dBZ is underestimated at all levels where CloudSat observations are available by both ERAI and HiGEM. This suggests that neither is able to capture the frequency with which large hydrometeors (i.e. heavy precipitation) occur at any altitude. It is notable that the mean values of reflectivity, as seen in Fig. 6, do not readily reveal this discrepancy, which is further indicative of biases in precipitation processes in both HiGEM and ERAI. This analysis supports a hypothesis that precipitation is both being produced at too low an altitude (particularly in ERAI) and that there is a lack of variability in precipitation intensity, leading to an underestimate of the largest hydrometeors.
The composites evaluated in this section support the conclusions of the ISCCP analysis and provide additional information on the vertical structure of composite ETCs in both HiGEM and ERAI. Both HiGEM and ERAI have biases in cloud structure around ETCs, with HiGEM found to have too much high cloud around the inflow region and ERAI found to have an excess of low altitude cloud. Neither is able to capture the deepest, intensely precipitating events. Consequently, both HiGEM and ERAI are likely to have biases in diabatic heating around composite ETCs.

Diabatic tendencies and PV
Diabatic heating can play a major role in the development of ETCs, in particular through the way PV is modified. Given the biases found in cloud structure in the previous sections, any differences in the dynamical evolution of composite ETCs in HiGEM and ERAI may be related to diabatic heating.
Heating tendencies from the parameterisation schemes which include diabatic effects have been evaluated, with the greatest heating being contributed by the large-scale cloud/ precipitation and convection schemes. Heating tendencies associated with the radiation schemes, which include diabatic effects, in both HiGEM and ERAI were significantly weaker than the two cloud schemes (not shown). This was also found in the study of Martínez-Alvarado and , which evaluated a single ETC in the Met Office Unified Model.
In Fig. 10, total heating from both the large-scale and convection schemes and the contribution of each scheme to the total heating at 700 hPa is shown. The total heating is comparable (a, d), though the source of this heating is very different. Convective heating is the dominant term in HiGEM (c) and large-scale heating in ERAI (e).
In Fig. 11 vertical transects through the location of the WCB and warm front are shown. The most intense total heating in ERAI (d) is focussed on the area of rapid ascent at the warm front (see Fig 2) where the large-scale scheme (e) produces heating rates of up to 0.3 K h −1 . The convective heating (f) does not exceed 0.15 K h −1 . In HiGEM, maximum convective heating rates (c) are up to 0.6 K h −1 . Large-scale heating rates (b) do not exceed 0.15 K h −1 . Total heating rates are greater in HiGEM (a), exceeding 0.65 K h −1 , with maximum total heating less than 0.41 K h −1 in ERAI (d).
In both Figs. 10 and 11, the location of the heating in the large-scale scheme in HiGEM is spatially offset to the heating associated with the convection scheme. The only significant large-scale heating in HiGEM occurs where ascent takes place at the warm front. In ERAI, in contrast, a small amount of convective heating occurs prior to the region of frontal ascent (as denoted by the θ e contours in Fig. 11 and vertical velocities in Fig. 2) with significant large-scale heating then occurring as air ascends on isentropic surfaces at the warm front. This behaviour leads to a differing vertical profile of heating, with greater heating at higher altitudes (above 700 hPa) in HiGEM (total column integrals have been calculated though are not shown here). This heating in HiGEM is largely due to the convection scheme, bearing out the hypothesis of Catto et al. (2010). It is outside the scope of this study to investigate the causes of the difference in behaviour in HiGEM and ERAI in any detail, though it is of interest to note that the convection scheme in ERAI is active far more frequently than HiGEM (not shown), though is typically limited to shallow, weak convection. HiGEM, on the other hand, tends to diagnose deep convection, though less frequently. This is further discussed in Sect. 4 and may be one reason for the relatively lower cloud fraction in HiGEM when compared to ERAI (Fig. 3a, b). Latent heat release generates (positive/cyclonic) PV anomalies below and (negative/anticyclonic) above the location of heating which modifies the evolution and structure of ETCs. In Fig. 12 vertical transects of several variables, including PV, from the composites are displayed. The heating contours in HiGEM can be related to the enhanced low level PV anomaly in HiGEM relative to ERAI. Though  it is not possible to attribute causality within the framework of this study, a strengthened low level PV anomaly has been shown to induce stronger low level flow within WCBs (Plant et al. 2003). The effect of such enhancement provides a positive feedback to the latent heating. The relative magnitude of the heating means the low level PV modification in HiGEM is larger than in ERAI (up to ∼0.5 PVU at maximum). Above the heating maximum, the evolution of the leading edge of the tropopause fold can also be contrasted in Fig. 12g-l. In HiGEM, greater mid-level latent heating can be associated with lower PV above and ahead (above 500 hPa at 10 • ) of the WCB heating maximum as negative PV anomalies are advected by the WCB outflow. In ERAI, the upstream tropopause ridge structure is less pronounced than in HiGEM. The decay of the composite ETC is coincident with the reduction in latent heating from 12 h prior to maximum intensity (Fig. 12, second and third column). The defined tilt present between the cyclone centre and tropopause fold during the development phase (Fig. 12g, h, j, k) weakens as the storms reach maximum dynamical intensity (Fig. 12i,  l). The composites become barotropic and decay simultaneous with the reduction of heating.
The height which airmasses reach within the WCB is important since PV anomalies are generated by the heating associated with this ascent and can lead to modification of the tropopause (e.g., Grams et al. 2011;Joos and Wernli 2011;Methven 2013;. Figure 13 shows PV on the 315 K isentropic surface. As the composite ETCs evolve towards maximum intensity, the ridging feature in HiGEM associated with the anticyclonic WCB outflow becomes more pronounced than in ERAI. Relating this to processes in the WCB, the contrasting tropopause structures can be connected to the heating differences in Fig. 12, with the implication that the PV modification from latent heating plays a role in the divergent evolution of PV at upper levels. These results are consistent with case study analysis of the relationship between diabatic processes in the WCB and upper-level flow (e.g., Stoelinga 1996;Grams et al. 2011;. The representation of the upper-tropospheric circulation around ETCs has implications for downstream development of ETCs, Rossby waves and blocking (e.g., Chagnon et al. 2012;Rodwell et al. 2013;Gray et al. 2014;Martínez-Alvarado et al. 2015;Pfahl et al. 2015). Any biases in this circulation may impact the ability of HiGEM to simulate ETCs at the eastern end of the storm tracks, such as those ETCs which affect Europe.  in HiGEM (a-c, g-i) and ERAI (d-f, j-l) in vertical cross-sections denoted as A-B (for a-f) and C-D (for g-l) in Fig. 10. Total latent heating tendency (large-scale and convection) is shown as closed contours at 0.4, 0.5 and 0.6 (K h −1 ). θ e is shown with narrowly dashed lines with contours as labelled (K). Times are denoted as hours relative to maximum dynamical intensity (white text in top right of each panel, hour 0 being maximum intensity). Plot radii are 20 • . Animation through composite lifecycle available as Supplementary Figure S4 1 3

Discussion and conclusions
This study has investigated the ability of HiGEM to represent processes associated with latent heat release in ETCs using a tracking and compositing technique. HiGEM has been evaluated using both the ERAI reanalysis and remote sensing data from ISCCP and CloudSat. ISCCP and Cloud-Sat have differing capabilities and sensitivities, so offer complementary tools for assessing cloud and precipitation structure around ETCs.
The key findings of this study are as follows • HiGEM and ERAI are shown to exhibit biases in their representation of clouds associated with ETCs. Both produce lower cloud fractions than ISCCP. HiGEM has a consistent high cloud top bias in the WCB of ETCs. In ERAI, a low cloud top bias is found in the WCB. Both HiGEM and ERAI have higher optical depths in the WCB than ISCCP. • Composite ETCs from CloudSat indicate that neither HiGEM nor ERAI produce sufficient variability in precipitation in ETCs, with neither capturing the most intense events. ERAI in particular produces precipitation at too low an altitude within the WCB. • Total latent heating tendencies in the WCB in HiGEM are greater than ERAI. The majority of the latent heating in HiGEM is produced by the convection scheme, whilst in ERAI the majority is produced by the large-scale cloud scheme. The vertical profiles of heating in the two datasets differ, with greater heating at higher altitudes in HiGEM. • Differences in latent heating amount and location can be related to the structure and evolution of potential  Latent heating can play a significant role in the evolution of ETCs through its role in the generation of potential vorticity anomalies (e.g., Hoskins et al. 1985;Hertenstein and Schubert 1991;Pomroy and Thorpe 2000;Smith 2000;Rogers and Fritsch 2001;Beare et al. 2003;Methven 2013). As such, poor representation of latent heating can have a significant influence on the evolution of ETCs in models. Willison et al. (2013), investigating the impact of resolving mesoscale heating features within ETCs in the Atlantic stormtrack, found that a higher resolution model which was able to better represent these features produced a more vigorous stormtrack. The representation of latent heating within ETCs may therefore impact on the representation of the wider environment in the extratropics in climate models. From a model development perspective, these results imply that evaluating the behaviour of a model's convection scheme in the mid-latitudes is worthwhile, as benefits in model performance gained elsewhere may be at the expense of this region (see also Booth et al. 2013). There are multiple differences in the convection schemes in ERAI and HiGEM, though differences in the CAPE closure timescale (fixed at 60 min in ERAI and varying from 60 to 5 min in HiGEM based on relative humidity) would, in the authors' opinion, be a sensible first candidate for investigation of the causes of the divergent behaviour seen in this study. Given the uncertainties associated with the assumptions used in satellite simulators, further research into the sensitivity of the representation of the key features (e.g., the vertical structure of precipitation, which dominates reflectivity) around ETCs to these assumptions would allow more detailed evaluation of the fidelity with which cloud/precipitation processes are represented than has been possible within this study.
The results in this study relate differences in latent heating within the WCB in HiGEM and ERAI to the structure and evolution of the potential vorticity (PV) field both within the WCB and downstream of the location of maximum latent heating. PV modification in the WCB in HiGEM can be related to upper level PV anomalies which influence the evolution of the high pressure ridge in the region the ETC is propagating towards. This has implications for the interaction of ETCs in HiGEM with their environment, such as their role in Rossby wave breaking or downstream development.
The findings of this study are significant, for both HiGEM and ERAI, since the structure of the cloud fields in the WCB are found to be deficient in both datasets and the location and magnitude of latent heating has been related to the cloud fields. ERAI is frequently used as a baseline dataset for evaluating climate models and these results indicate caution is required when using ERAI to evaluate cloud structure and diabatic processes around ETCs. One question to be addressed is what impact the biases in the representation of ETC cloud structure and latent heating in HiGEM, in addition to previous work showing biases in the representation of precipitation associated with ETCs (Hawcroft et al. 2015), have on future projections of ETCs and the storm tracks produced by HiGEM. More generally, the methodology applied in this study presents a way to observationally constrain climate model simulations and allows a phenomena centred approach to evaluating sources of bias.
Many climate models exhibit biases in their representation of the storm tracks, both in terms of their location and variability (e.g., Matsueda et al. 2009;Zappa et al. 2013). In a warmer climate with increased moisture availability, the behaviour of ETCs may change. As such, models must be able to represent diabatic processes in ETCs to provide robust projections of extratropical climate. This study provides further motivation for comprehensively evaluating the ability of GCMs to represent the structure and processes around ETCs given their possible role in wider climatological model biases and uncertainties in future projections. One way that these evaluation capabilities could be enhanced would be through the development of improved observational systems for the extratropics, for example, through deployment of a satellite borne precipitation radar which has the capability of gathering three-dimensional storm structure, as the Tropical Rainfall Measuring Mission (TRMM, Kummerow et al. 1998) does in the tropics.