1 Introduction

As urbanisation levels and urban populations continue to grow (United Nations 2018), there is an increasing need for urban climate services (Baklanov et al. 2018). Climate models need to represent future extremes correctly, such as the intensity and frequency of heat waves, droughts or floods. Urban areas can strongly exacerbate such events at the local scale due to their impact on the surface energy balance (SEB) and surface hydrology. Global climate models (GCM) have become increasingly high resolution, with frontier runs allowing horizontal resolutions (N1280 HadGEM3–PRIMAVERA (\(\sim \)10 km) Vidale et al. in. prep.) that are similar to coarse resolution numerical weather prediction (NWP) applications (e.g. ECMWF; Holm et al. 2016). This means that large metropolitan areas like London cover several grid cells (assuming accurate land-cover information) and should be represented accordingly. Climate models can provide information on the likelihood and nature of (long-term) future extreme events under various forcing conditions. However, to develop appropriate plans and responses based on integrated urban services (WMO 2018), scenario runs should ideally be made at NWP resolution. To capture the impact of cities on surface–atmosphere interactions, urban land-surface models (ULSM) are used within NWP and climate models. ULSMs provide a solution to the urban surface-energy balance (USEB; e.g. Oke et al. 2017):

$$ Q^{\ast} + Q_{F} = Q_{H} + Q_{E} + \varDelta Q_{S} , $$

where Q = (KK) + (LL) is the net all-wave radiation composed of the incoming () and outgoing () shortwave (K) and longwave (L) radiation components. QF is the anthropogenic heat flux, and QH and QE the turbulent sensible and latent heat fluxes. ΔQS is the net storage heat flux that includes the storage of heat within the urban fabric (thermal inertia, CdT/dt, where C is the areal capacity and T the surface temperature; Porson et al. 2010a), and the ground-heat flux. Net horizontal advection of heat or moisture is included on the right side of Eq. 1 if the ULSM is coupled to an atmospheric model.

1.1 Representation of urban processes

Urban processes are represented in atmospheric models with varying levels of morphological and physical complexity depending on computational resources and resolution (Fig. 1). ULSMs used in NWP or climate simulations parameterise the effects of urban environments on surface fluxes without representing buildings explicitly. Numerous ULSMs exist (Grimmond et al. 2009) using different approaches to model sub-grid scale processes (Grimmond et al. 2010; Grimmond et al. 2011). With the ‘tile’ approach, urban areas are treated separately and in parallel with other surfaces in the grid box (e.g. vegetation or water). Simpler modelling approaches modify vegetation schemes (e.g. Best 2005) by using bulk surface parameters that are representative of impervious, urbanised surfaces (e.g. increased heat capacity and surface roughness). More complex models use parameterisations of processes in simple street canyons, for which morphology parameters can vary spatially (e.g. Masson 2000; Martilli et al. 2002; Porson et al. 2010a). A key problem at this scale is the description of the subgrid-scale heterogeneity of the urban surface in the models (Barlow et al. 2017).

Fig. 1
figure 1

Representation of urban areas at different spatial scales (domain sizes from \(\mathcal {O}(1000\) km) to \(\mathcal {O}(10\) m)) and model resolutions together with prevailing modelling approaches, with H, mean building height; σH, standard deviation of building heights; W, street-canyon width; z0, roughness length for momentum; λp, plan-area fraction; λf, frontal-area fraction

Increasing the level of detail with which urban processes are modelled becomes important as grid resolutions approach the urban grey-zone regime (\(\mathcal {O}(10-100\) m)) both in terms of physics (turbulent exchange is now partially resolved) and buildings (morphology variations from one grid box to the next can be significant). Neighbourhood to building-scale simulations feature building-dominant processes that need to be directly resolved and coupling of indoor–outdoor environments becomes possible.

1.2 Focus of this study

The choice of ULSM for a specific application is often constrained by practical considerations (e.g. Best 2006a), for example the computational resources available. With increasing model complexity, the demands on spatially varying ancillaries increase. Land-cover and building morphology/material information or population density and energy-use data to represent anthropogenic heat emissions may not be readily available at the spatial/temporal scales of interest.

The Met Office Unified Model (UM) is routinely run on regional to global scales in both climate mode (HadGEM) and NWP setups using different complexity ULSMs available in the Joint UK Land Environment Simulator (JULES; Best et al. 2011; Clark et al. 2011). The UM–JULES modelling suite has many international users beyond the UK, e.g. meteorological services in Australia, Singapore and South Korea, and an increasing demand for its application over highly urbanised areas of the world such as China and India. While HadGEM simulations currently use the simple one-tile (1T) urban scheme in JULES (Best 2005), since March 2016 the regional UKV NWP model (1.5-km horizontal resolution over the UK) is run operationally with the urban-canopy model MORUSES (Met Office–Reading Urban Surface Exchange Scheme; Porson et al. 2010a, 2010b; Bohnengstengel and Hendry 2016).

In this study, we focus on the following questions:

  1. 1.

    What are the implications of ULSM options, together with resolution and accuracy of ancillary information, in current UM–JULES modelling environments used for NWP and climate simulations?

  2. 2.

    How can current urban modelling capabilities in UM–JULES be improved, especially with regard to using the model output for integrated urban services?

To address these questions, the JULES ULSMs are evaluated offline against multi-year flux observations in the UK in two different urban settings: (i) the high-density, high-rise city centre of London (King’s College London Strand campus; KCL); and (ii) the suburban town of Swindon, located \(\sim \)115 km west of London. The offline runs use the JULES configurations of (i) the operational UKV NWP setup and (ii) Global Land 7.0 (GL7.0; Walters et al. 2019; Wiltshire et al. 2020) used in HadGEM3 climate simulations. This is the first time that the JULES ULSMs are evaluated together over multi-year periods, allowing the study of seasonal variations in model performance under a range of different conditions.

2 Urban models in JULES

JULES is a community land-surface model that uses a tile approach to represent the subgrid-scale heterogeneity of land cover (Best et al. 2011). This allows no interaction between tiles below the first atmospheric model level, where the total (grid box-averaged) surface fluxes are obtained by weighting the tile variables by their plan-area fractions (Best et al. 2006).

Urban areas are represented in JULES either using the (Best 2005) single urban tile scheme (Best 1T) or using a 2-tile (2T) approach that splits the impervious urban surface into a roof and a street canyon. For the 2T approach, JULES has two options: (i) a simple 2T slab scheme (Best 2T; Best et al. 2006) and (ii) MORUSES (Porson et al. 2010a; 2010b).

2.1 Representation of the USEB

The Best 1T and Best 2T approaches solve the SEB of a modified vegetation canopy that interacts with the first soil level via longwave radiative coupling. Urban bulk parameters for the canopy (Best 1T) or the roof/canyon combination (Best 2T) are prescribed (urban albedo αU, urban emissivity εU, bulk areal heat capacity CU, roughness length for momentum z0 and the ratio zh/z0, where zh is the roughness length for heat). The maximum available surface water mass and infiltration rates are reduced to limit the evaporation over the impervious tile (Best 2005). In standard UM–JULES configurations, urban parameters for the Best models are the same for each city and across a city. While this is not an intrinsic model limitation, allowing for spatial variations of model parameters would require significant code changes in JULES.

MORUSES includes several modifications of the above: (i) bulk parameters for the street-canyon tile (canyon albedo αc, emissivity εc and areal heat capacity Cc) are calculated from canyon geometry and prescribed material properties for roads and walls (material albedo α, emissivity ε, volumetric heat capacity c, thermal diffusivity k); (ii) z0 and zh are calculated explicitly (Section 3.3); and (iii) spatially varying urban morphology data can be provided (Bohnenstengel et al. 2011). The presence of a street canyon (building height H, street width W ) in MORUSES allows modelling of short- and longwave radiative trapping. Turbulent scalar fluxes are computed through a resistance network that responds to canyon geometry, modified from Harman et al. (2004). Three flow regimes are used: (i) isolated-building regime (H/W ≤ 1/3), (ii) wake interference (1/3 < H/W ≤ 2/3) and (iii) skimming flow (H/W > 2/3). The net storage heat flux (ΔQS) depends on thermal material properties and canyon geometry (Porson et al. 2010a). The canyon/roof tiles are conductively/radiatively coupled to the ground. For the canyon tile, only the road facet is coupled to the soil (not the walls). The insulation behaviour of the roof tile can be controlled via a prescribed thermal diffusivity or by using a very small roof thickness, Δzf (Porson et al. 2010b).

2.2 Treatment of urban vegetation

UM–JULES typically uses five plant functional types (PFT) in separate tiles: broadleaf tree, needleleaf tree, C3 grass, C4 grass, shrubs. Urban vegetation is modelled using these vegetation tiles and not as part of the urban SEB. As a result, processes in built and vegetated areas do not interact below the first model level and QE only originates from urban tiles after rainfall. Shortcomings of this approach have been discussed by Grimmond et al. (2011) and Loridan and Grimmond (2012a, 2012b).

Prescribing urban anthropogenic moisture sources (e.g. irrigation) in JULES is not possible currently (Best and Grimmond 2016). While this is not critical for the evaluation sites considered in this study, effects of urban irrigation can be important in some cities (e.g. Ao et al. 2018; Dou et al. 2019).

2.3 Anthropogenic heat emissions

Currently, JULES (vn5.4) has two options for the emission of QF over the urban tile(s): (i) effectively none (0 W m− 2) and (ii) fixed values (JULES–QF) derived from the Digest of United Kingdom Energy Statistics (DUKES 2003). The latter is used operationally in the UKV and represents a UK-averaged monthly mean QF for 1995–2003 adjusted to the fraction dissipated over impervious/built areas. Consistent with this, when QF is added to the SEB in JULES it is only applied to the urban tile(s) and set to zero elsewhere. Hence, there is spatial variability at the grid scale as it is proportional to the urban fraction in the grid box. However, there are no local adjustments that account for differences between city centres and suburban areas, which arise from changes in building type and volume, population density and traffic volume (e.g. Sailor 2011; Iamarino et al. 2012). The minimum (unscaled; i.e. 100% urban) monthly JULES QF for the UK occurs in August (\(\sim \)17 W m− 2), the maximum in December (\(\sim \)26 W m− 2; Fig. 2).

Fig. 2
figure 2

Monthly anthropogenic heat flux (QF) specified in JULES (both unscaled and adjusted to urban fractions in London/Swindon), together with GQF output for central London (Kotthaus and Grimmond 2014a) and inventory heat emissions for suburban Swindon (Ward et al. 2013) aggregated as monthly medians plus inter-quartile range

Figure 2 shows JULES–QF together with results from the GreaterQf model (GQF; Iamarino et al. 2012; Gabey et al. 2019) for central London (January 2011–December 2013; Kotthaus and Grimmond 2014a), and heat emissions for Swindon derived from inventory data (January 2011–September 2013; Ward et al. 2013). To compare with JULES, the hourly QF data for London and Swindon are aggregated into monthly medians. JULES - QF is shown both in its unscaled (100% urban) version and scaled to the urban tile proportions at the two sites (66% for London; 49% for Swindon; Table 1). The scaled values are representative of those used in the simulation.

Table 1 Plan-area land-cover fractions used in the offline runs for (a) London/KCL and (b) Swindon

In MORUSES, QF is either released only in the canyon or is evenly distributed among canyon and roof. The latter is the UKV default. The prescribed monthly values are linearly interpolated to the model time step, but diurnally constant. Hence, variations of QF related to sleep/work patterns, different energy demands on weekdays/weekends, public holidays or as a result of daylight-saving time, day-to-day or temperature conditions (Grimmond 1992; Sailor and Vasireddy 2006; Gabey et al. 2019) are not reflected. Using a fixed daily value can result in too large/low QF releases during the night/day (Grimmond et al. 2010).

3 Offline JULES–USLM configurations

JULES is run offline over 3 years (January 2011–December 2013) at an hourly time step using surface observations in central London and suburban Swindon as forcing data (Section 4; Table 5a). A modified version of JULES vn5.2 is used that includes a fix enabling the radiative coupling of the MORUSES roof tile to the soil. This fix is part of the JULES release from vn5.4 and is expected to be included in the next available operational UKV release cycle. We compare three configurations (cf. Appendix 1):

  1. 1.

    HadGEM: JULES–GL7.0 science configuration (Walters et al. 2019; Wiltshire et al. 2020) and \(\sim \)10 km resolution ancillaries of the N1280 HadGEM3–PRIMAVERA GCM system; using Best 1T (hereafter labelled: HAD–B)

  2. 2.

    UKV: Operational JULES–UKV science configuration (OS40) and 1.5-km resolution UKV ancillaries (OS41); using MORUSES (UKV–M)

  3. 3.

    Control: Operational JULES–UKV (OS40) together with high-resolution (1 m) land cover, building morphology and roughness data and the JULES leaf-phenology model (Cox 2001; Clark et al. 2011); using MORUSES (CTRL–M) and Best 1T (CTRL–B)

It is noted that while the CTRL–B runs use JULES–UKV, the SEB statistics remain mostly unchanged if JULES–GL7.0 is used with urban parameters and QF options kept the same (cf. seasonal diurnal statistics in the Online Resource Figs. S4, S5). Small differences stem from the hydrology and inland-water parameters (Appendix 1). Hence, ‘CTRL–B v. HAD–B’ (as ‘CTRL–M v. UKV–M’) differences are not from the model science, but reflect the impact of ULSM parameter choices, treatment of QF and differences in prescribed land cover.

3.1 Land-cover information

The operational UKV land-cover ancillaries are based on \(\sim \)25-m resolution 1990 UK land-cover data (CEH 1990). The HadGEM3–PRIMAVERA ancillaries are derived from global \(\sim \)1-km resolution data of the International Geosphere–Biosphere Project (IGBP; reference years 1995/1996; Walters et al. 2019, and references therein).

Figure 3a shows an urban versus non-urban binary land-cover classification over the south-east UK from the Global Urban Footprint (GUF) project that used TerraSAR–X/TanDEM–X radar data (\(\sim \)12-m resolution, reference year 2011; Esch et al. 2011; Esch et al. 2017; Esch et al. 2018).

The spatial representativeness of tile fractions for UKV–M and HAD–B varies with grid resolution of the atmospheric model (1.5 km and 10 km, respectively). The fraction of urban land cover (furban) in the UKV (Fig. 3b) and HadGEM3 (Fig. 3c) varies between the Greater London area and the surrounding urban regions. The coarser resolution HadGEM3–PRIMAVERA ancillaries (Fig. 3c) may make the outskirts or small urban areas (like Swindon) appear more vegetated. Urban fractions for Greater London, on the other hand, are over-estimated compared with the CEH-based UKV ancillaries (Fig. 3b; cf. Bohnenstengel and Hendry 2016). Highly urban coastal land tiles (Fig. 3c) are likely artefacts from the land-sea mask overlapping with only a small fraction of land with large furban. Affected grid cells are mostly sea, so the effect on the grid box results is small.

Fig. 3
figure 3

Urban regions in the south-east UK as derived from (a) GUF (Esch et al. 2011; Esch et al. 2017; Esch et al. 2018) and urban land-cover fractions (furban) used in (b) the UKV and (c) HadGEM3–PRIMAVERA. See text for details on resolution and reference years

The land cover for the HAD–B and UKV–M runs is extracted from grid boxes containing the observation sites. In the HadGEM3 ancillaries, Swindon is represented by a single urbanised grid box (furban = 0.23, Table 1b). The KCL site in central London is located in a grid box with a very high urban fraction (furban = 0.98; Fig. 3c). In the UKV ancillaries, KCL sits at the border of two grid boxes (UKV west/east; Fig. 4) with quite different land cover (Table 1a). UKV–M at KCL is run for both grid boxes as the resolution accuracy of the computational mesh (\(\mathcal {O}(10\) m)) means that the site can be situated in either.

Fig. 4
figure 4

Land-cover distribution in central London around the KCL site (red dot), with wind rose (January 2011–December 2013; inset) and the UKV west/east grid boxes. The 500-m radius area (large circle; region T) contains the 80% source area of turbulent heat fluxes for the site (neutral and unstable conditions). Labels (A–D) indicate the four sectors defined to sub-sample the CTRL runs. CTRL–T is for the complete circle

Land-cover and building-morphology parameters for the CTRL runs are for a 500-m radius area around each site, covering the source areas of the turbulent fluxes and radiation instruments (Ward et al. 2013; Kotthaus and Grimmond 2014b). For Swindon, the CTRL runs use the land cover reported by Ward et al. (2013) and Ward et al. (2016). For London/KCL, the land cover is derived from high-resolution (1m) GIS data using the Urban Multi-scale Environmental Predictor (UMEP; Lindberg et al. 2018). At this site, the land cover varies strongly by wind direction. The River Thames dominates the area to the south, whereas the north is densely built-up (Fig. 4). For the CTRL runs, the total 500-m radius area (region T) is used, together with four additional segments (regions A, B, C and D; Fig. 4) that are sub-sampled based on wind direction for the evaluation of the turbulent heat fluxes (Section 5.1). At Swindon, there is slightly more vegetation to the north-east and slightly more impervious surfaces to the south-west (Fig. 2 in Ward et al. 2013), but land-cover variations are not as marked as in London.

3.2 Building morphology

The input parameters required for MORUSES are as follows: mean building height (H), canyon aspect ratio (H/W) and canyon-width ratio (W/R), where W is the street-canyon width and R is the combined length of canyon and roof. The ratios can be related to the plan and frontal-area fractions (λp, λf; Porson et al. 2010a) via:

$$ \frac{W}{R} = 1-\lambda_{p} $$
$$ \frac{H}{W} = \frac{\pi}{2} \frac{\lambda_{f}}{(1-\lambda_{p})} , $$

where λp is the proportion of the urban tile occupied by buildings (i.e. the roof fraction), so that W/R is the fraction of non-building impervious (street canyon) surface.

The operational UKV morphology ancillaries are generated from empirical polynomial relations derived from high-resolution (\(\mathcal {O}\)(1 m)) 3D London morphology data (Virtual London dataset; Evans et al. 2006) that form the basis of the parametrisations of λp, λf (Bohnenstengel et al. 2011) and H (Gilham et al. 2019) with the urban fraction in the grid box, furban. Given the \(\sim \)1-km resolution for NWP, a 50-m radius smoothing filter removed small-scale land-cover variations (details in Bohnenstengel et al. 2011). As the empirical relations were only derived for one city, their use elsewhere should be assessed or the MORUSES morphology parameters should be derived from local data.

For the CTRL runs in London, H, λp and λf are derived directly from 1-m resolution GIS data. For consistency, Eq. 3 is used to obtain H/W despite UMEP being able to derive the aspect ratio directly (Lindberg et al. 2018). For the Swindon site, the morphology and roughness characteristics reported by Kent et al. (2018) are used.

3.3 Roughness parameters

Roughness and morphology parameters for the offline runs are summarised in Tables 2a3a and 4. HAD–B uses the default Best-1T z0 of 1 m. In all runs with the Best-1T model, zh/z0 uses the fixed default value of 10− 7 (Best et al. 2006).

Table 2 Overview of parameters used for JULES (Best 1T and MORUSES) in central London (KCL; ‘L’) and Swindon (‘S’)
Table 3 As Table 2, but for London/KCL CTRL–M runs for sectors A–D (Fig. 4)
Table 4 Ratio of roughness lengths for heat and momentum for the MORUSES runs at (a) London/KCL and (b) Swindon

In MORUSES, zh is calculated from the bulk resistances to scalar transport of the roof and canyon tiles (Porson et al. 2010a, and Eq. 10 in Appendix 2). The canyon resistances vary in response to H/W (Section 2.1). In the isolated-building flow regime (H/W ≤ 1/3), the upper limit of zh in MORUSES is fixed to 0.1z0m, where z0m = 0.05 m is the prescribed material roughness length for momentum.

The resistance network is driven by the flow at the roof and canyon top, obtained by extrapolating from the model-forcing level without correcting for stability effects and assuming that the logarithmic layer extends down to the building top (Harman et al. 2004). The roughness parameters, zd and z0, needed for this calculation can be obtained from the Macdonald et al. (1998) morphometric method available in MORUSES (Eqs. 8 and 9; Appendix 2) or can be prescribed. For the CTRL runs, the Macdonald z0 and zd are calculated externally and then prescribed. Value differences between UKV–M and CTRL (Table 2a) are caused by the differences in resolution and method of derivation of the underlying morphology data.

Modifications to the Macdonald method to account for building-height variability, like the Kanda et al. (2013) approach, have been shown to provide better predictions of wind speeds above cities (Kent et al. 2017; Kent et al. 2018a). While in the Macdonald approach zd scales with H, Kanda’s modification accounts for the maximum building height within zd. For the 500-m radius area around the London site (region T; Fig. 4), the Kanda z0 = 2.3 m and zd = 30 m (i.e. approximately 10 m above H) are 0.5 m and 17 m greater, respectively, than the Macdonald values (Table 2a).

Being able to reflect the impact of increased drag on the flow is crucial if the ULSM is applied in environments with tall buildings and/or a large heterogeneity of building heights. Although the Macdonald model available in MORUSES effectively limits the displacement height such that zd ∈ [0,H] (Eq. 8), the facility exists to supply other data as ancillaries. Conceptually, JULES estimates the surface exchange at a height of zd + z0, representing an effective surface in the model. This means that the effective height of the street canyon in MORUSES is Hzdz0. If Hzd + z0, as can be the case in high-rise city centres, the velocity at the canyon top currently defaults to zero. MORUSES requires further development in order to make use of parameterisations that enter this regime. For the roof tile, the criterion is slightly less strict with 1.1Hzd + z0.

3.4 Radiation and thermal controls

Urban radiative and thermal parameters used in this study are summarised in Table 2b and c. UKV–M prescribes the same radiative and thermal urban fabric properties across UK cities, assuming asphalt roads, brick walls and clay roofs (Porson et al. 2010a; Bohnenstengel et al. 2011). In London, the CTRL–M radiative parameters are adjusted to provide the best match to the observed bulk parameters reported by Kotthaus and Grimmond (2014a). For CTRL–M at Swindon, the default UKV parameters are representative of typical building materials at this site (Ward et al. 2013).

In MORUSES, the ‘thin-roof’ setting is used (UKV default), by which insulation is modelled through a very small roof thickness (the minimum of Δzf = 0.02 m and the calculated damping depth). Earlier comparisons with the ‘thick-roof’ case, for which \(\varDelta z_{f} \propto k^{0.5}_{f}\) (Eq. 45 in Porson et al. 2010a), showed that the ‘thin-roof’ option improves the phase of QH (Porson et al. 2010b).

HAD–B uses the default bulk values of αU = 0.18, εU = 0.97 and CU = 0.28 MJ K− 1 m− 2. For CTRL–B, the bulk αU and εU are derived from observations (Kotthaus and Grimmond 2014a; Ward et al. 2013), while CU was left unchanged from the default HAD–B value.

In all runs, the (snow-free) albedos for bare soil (α = 0.11) and inland water (JULES–UKV: α = 0.06; JULES–GL7.0: α = 0.12) are prescribed. For vegetated surfaces, JULES calculates the albedo as a function of leaf-area index (LAI; m2 m− 2; Best et al. 2011).

3.5 Vegetation controls

LAI and canopy height (Hc; m) for UKV–M and HAD–B are based on the UKV and HadGEM3–PRIMAVERA ancillaries that vary spatially and by month. In this data source, LAI is effectively zero for all PFTs in highly urbanised grid boxes such as central London, although vegetation is present (Fig. 4). For UKV–M at Swindon, this also affects the tree and shrub tiles. The LAI ancillaries are based on the relatively coarse IGBP land-cover mapping (Fig. 3c), which for the UKV can lead to inconsistencies with the CEH land cover. The canopy height, however, is non-zero so that roughness effects of the vegetation are modelled.

In case of near-zero LAI, JULES treats the vegetated tiles effectively as bare-soil surfaces with zero evapotranspiration and minimised water-holding and interception capacity (Best et al. 2011). In this study, the affected vegetation fractions are low (0.01–0.12; Table 1), but even small amounts of vegetation can have a strong impact on the surface-energy partitioning in urban areas (Grimmond et al. 2010; Loridan and Grimmond 2012a; 2012b).

Another feature shared by the UKV and HadGEM3–PRIMAVERA ancillaries over the southern UK is an anomalous rise in LAI in December/January following a minimum in November. This affects all PFTs and stems from the underlying MODIS Terra 4-km LAI dataset (collection 5; 2005–2009). The newer MODIS (collection 6) products do not show this artefact over the same analysis period.

The CTRL runs use the JULES leaf-phenology model (Cox 2001; Clark et al. 2011) to simulate leaf-on/leaf-off periods in response to temperature changes and the UKV Hc ancillaries are prescribed.

3.6 Anthropogenic heat flux

The CTRL runs use the monthly median GQF results for London/KCL and inventory emissions for Swindon (Fig. 2), both with and without scaling to the urban fraction (cf. Section 2.3). The former option is consistent with the way in which the value is used within JULES and the latter option results in an underestimation of the heat supplied to the system with respect to the GQF and inventory emission values. This is because JULES assumes that QF is released only on the urban tile(s); thus, the grid value is correspondingly reduced. The GQF and inventory data, however, are for the complete area. Figures S1 and S2 in the supplementary Online Resource show results from using the scaled values of QF, consistent with the assumptions within JULES. CTRL–B in London (Fig. S1d) has a reduced QH MBE over all seasons and CTRL–M/B in DJF both benefit from the increased magnitude of QF when compared with the results using the unscaled values (Section 6.1.2; Fig. 5d), but the scaling leads to larger errors overall. However, there is a large uncertainty in the estimation of QF and it is unclear what a representative magnitude of QF ideally should be for the sites. Therefore, we discuss the CTRL runs with the unscaled QF in Section 6.

Fig. 5
figure 5

Modelled and observed median seasonal diurnal cycles with inter-quartile range (shading) at London/KCL for (a) K, (b) L, (c) Q, (d) QH and (e) QE in winter (DJF), spring (MAM), summer (JJA) and autumn (SON). All quantities represent grid box averages. The sample size (N) of the hourly data pairs and the seasonal 24-h MAE / MBE (W m− 2) shown for (1) CTRL–M, (2) CTRL–B, (3.a) UKV–M (east), (3.b) UKV–M (west) and (4) HAD–B, with the lowest MAE and |MBE| highlighted (bold). See Section 5.2 for metrics and Table 2 for run parameters

In UKV–M, the default JULES–QF values are used (Fig. 2), and QF is neglected in HAD–B, reflecting the JULES settings for the UKV and GL7.0.

3.7 Model spin-up and output

For each of the four JULES soil layers (0.1, 0.25, 0.65 and 2.0 m depths), soil temperature and soil wetness (mass of soil water as a fraction of water content at saturation) are initially prescribed. The run is considered ‘spun-up’ if the change in soil-moisture content (kg m− 2) is below 1% and the change in soil temperature below 0.2 K over a spin-up cycle. The first year of forcing data is used for the spin-up.

The soil states reached after the successful spin-up completion differ in response to the JULES configuration and land-cover ancillaries (cf. Appendix 3). The differences in soil states are maintained (qualitatively) throughout the evaluation period, indicating another difference between experimental setups. However, repeating the CTRL–M/B simulations with swapped initial conditions in a sensitivity test, in London results in negligible differences between the turbulent heat-flux statistics after the first 4 months or so of the simulation. Whereas in the more vegetated Swindon, differences persist over the first year with the largest impact in summer.

Results from the offline runs are a sample at the model time step (1 h). Seasonal and diurnal statistics are computed in a post-processing step, in which the model output is analysed only when observations are available (Section 4).

4 Observations

For model forcing and evaluation hourly averages measured in central London (KCL; 51.511N, 0.117W) and Swindon (51.585N, 1.798W) are used. The London/KCL data comprise measurements from two sites on the same roof (horizontal distance < 45 m; vertical offset 1.4 m), which operated consecutively during the study period (Kotthaus and Grimmond 2014a, 2014b; Bjorkegren et al. 2015). Following Ward et al. (2016), we define the measurement height as the height of the wind-speed measurements above ground level (agl), which is 10.6 m in Swindon and 49.6 m at KCL. The latter is the average height of both KCL sites.

The forcing height in JULES is the height of the instruments above the local zd (Table 2a; KCL zd for region T). For KCL, this is 36.4 m; for Swindon 8.6 m. At both sites, the JULES forcing heights for temperature/humidity (zT,q) and wind (zu,v) are equal.

Table 5 (Appendix 4) gives an overview of observed variables and instrumentation. Observed forcing variables (Table 5a) are K, L, air temperature (Tair), rainfall rate (RR), wind speed (Uh), specific humidity (q) and atmospheric pressure (P). Periods of snowfall and settled snow are insignificant at both sites during the evaluation periods (Ward et al. 2016). The forcing data are gap-filled observations used in the offline evaluation of the SUEWS land-surface model for 01 January 2011–31 December 2013 (Ward et al. 2016).

The evaluation focuses on the outgoing radiation components (K, L) and the net all-wave radiation (Q), measured with radiometers, and the turbulent heat fluxes (QH, QE) derived from eddy-covariance measurements (Table 5b). Evaluation data are available at Swindon over a 2-year period (09 May 2011–25 April 2013) and in London for the entire 3-year simulation span. Occasional data gaps within these periods generally are of short duration. Detailed descriptions of site characteristics, instrumentation, data processing techniques and analyses are provided by Kotthaus and Grimmond (2014a, 2014b) and Bjorkegren et al. (2015) for London/KCL and Ward et al. (2013) for Swindon.

5 Evaluation methods

5.1 Source-area sub-sampling

The JULES tiling scheme does not account for the distribution of land-cover features within a grid box. In order to reflect the difference in the eddy-covariance source-area characteristics around London/KCL (Fig. 4), the CTRL–M and CTRL–B results are created by sub-sampling the grid box QH and QE of the individual CTRL runs for sectors A to D based on the observed horizontal wind direction (𝜃). For the 3-year study period, 52.6% of samples fall into region A, 9.6% in B, 5.8% in C and 32% in D (cf. wind rose, Fig. 4).

The four source-area segments have large differences in land-cover characteristics (Table 1): region A (0𝜃 < 100 and 230𝜃 < 360) is dominated by impervious surfaces with little vegetation; region C (140𝜃 < 180) includes large fractions of the River Thames and regions B (100𝜃 < 140) and D (180𝜃 < 230) include water, impervious surfaces and urban park vegetation.

The source area of the radiometers is independent of 𝜃, but shadowing fractions vary by time of day and seasonally. In London, the runs for region T are compared with the radiation observations.

5.2 Evaluation metrics

To quantify the model performance, predicted (Pi) and observed (Oi) variables are compared in terms of evaluation metrics. These include the mean absolute error:

$$ \textrm{MAE} = \frac{1}{N} \sum\limits_{i=1}^{N}(\vert P_{i} - O_{i}\vert) , $$

the mean bias error:

$$ \textrm{MBE} = \frac{1}{N} \sum\limits_{i=1}^{N} (P_{i} - O_{i}) , $$

the normalised centred root-mean-square error:

$$ \textrm{nRMSE} = \frac{\sqrt{\frac{1}{N} {\sum}_{i=1}^{N} [(P_{i}-\overline{P}) - (O_{i}-\overline{O})]^{2}}}{\sigma_{O}} $$

and the correlation coefficient (Pearson’s R):

$$ \mathrm{R} = \frac{\frac{1}{N} {\sum}_{i=1}^{N} [(P_{i}-\overline{P})(O_{i}-\overline{O})]}{\sigma_{P} \sigma_{O}} , $$

where N is the sample size and σO and σP are the standard deviations of observations and predictions, respectively.

Online evaluations of the UKV run with MORUSES and Best 1T noted different model behaviours based on the timing of turbulent heat fluxes in the morning (Warren et al. 2018). For the offline tests, the metrics are therefore computed for 24-h periods and for sections of the diurnal cycle. Local sunrise (SR), solar noon (SN) and sunset (SS) times are used to aggregate the samples at time t into one of four periods:

  1. 1.

    Morning (M): SR+ 1 h < t ≤ SN

  2. 2.

    Afternoon (A): SN < t < SS-1 h

  3. 3.

    Transition (T): SR–1 h ≤ t ≤ SR+ 1 h and SS–1 h ≤ t ≤ SS+ 1 h

  4. 4.

    Night (N): SS+ 1 h < t < SR–1 h

6 Offline ULSM evaluation results

6.1 Seasonal-diurnal USEB variability

SEB components are analysed in terms of median diurnal cycles of grid box averages (i.e. averaged over all tiles) in central London (Fig. 5; 3-year period) and suburban Swindon (Fig. 6; 2-year period) in winter (DJF), spring (MAM), summer (JJA) and autumn (SON). Differences in the JULES–ULSMs performance are overall more pronounced for London, as in Swindon the urban plan-area fraction is a much smaller part of the grid box and therefore the contribution to the fluxes is lower (Table 1). Hence, the Swindon results are to a much larger degree determined by processes in the vegetated canopies.

Fig. 6
figure 6

As Fig. 5, but for Swindon with the the seasonal 24-h MAE / MBE (W m− 2) for (1) CTRL–M, (2) CTRL–B, (3) UKV–M, (4) HAD–B

6.1.1 Radiation and storage-heat flux

The energy available for partitioning into the turbulent heat fluxes is determined from Q + QFΔ QS (Eq. 1). As QF is prescribed, first it is important to evaluate the outgoing radiation (K, L) and storage-heat flux. While the radiometer at London/KCL sees almost exclusively roofs, urban canyons and some trees, in Swindon K and L are to a large degree affected by vegetation.

Differences in prescribed bulk and material albedos (Table 2b) impact K (Figs. 5a6a). At both sites, HAD–B consistently shows the largest deviation from the observations. This can partly be attributed to the comparatively large default urban albedo used in JULES–GL7.0 (αU = 0.18). The different land-cover make-up for HAD–B compared with the observed source area also contributes to the differences in the grid box K, particularly in London through a larger contribution from the urban tile (furban = 0.98). However, even when prescribing land cover that is representative of the radiometer source area (as in CTRL), the HAD–B error remains larger compared with the other runs (cf. Online Resource Figs. S4a, S5a), highlighting the impact of αU. CTRL–B (Figs. 5a6a) uses the observed αU of 0.11. In London, this lower bulk urban albedo together with the addition of 21% inland water (α = 0.06) in CTRL–B (region T, Table 1a) greatly improves the agreement with the observations over 24 h (low MAE and |MBE|).

MORUSES assumes a (fixed) fraction of 0.3 of K is scattered by the sky and uses the Masson (2000) shadowing factors and multiple reflections within the canyon to obtain the canyon albedo (αc). The latter depends both on H/W and on solar zenith angle (cf. Fig. 3 in Porson et al. 2010a), resulting in seasonal and diurnal variations. Depending on H/W and the difference of the wall and road albedos prescribed, αc can reach a maximum or minimum at solar noon (cf. Figs. 6,7 in Masson 2000). The former is the case if wall and road albedos are similar (e.g. CTRL–M in London; Online Resource Fig. S3a), while the latter regime arises when they are very different (default JULES–UKV values for αr, αw; Online Resource Fig. S3b,c).

αc reported in Table 2b is the JJA median at solar noon, when the albedo has the greatest impact as K is generally at its peak. However, especially for the runs using the UKV–M material albedos, αc shows strong seasonal-diurnal variability (Online Resource Fig. S3b,c). In London, CTRL–M (region T) computes a very low JJA canyon albedo of 0.04 (much lower than the roof albedo of 0.13) that varies only little by season or time of day. Using the default UKV material albedos, CTRL–M in Swindon and UKV–M at both sites have the same JJA αc of 0.08. However, the differences between αc for UKV–M east/west are larger in other seasons (cf. Online Resource Fig. S3b) as trapping effects in MORUSES are more pronounced when the sun is high, and hence are stronger in summer than in winter and stronger for larger H/W.

A bulk urban albedo, αU, for the canyon–roof combination can be obtained as the weighted average of αc (here the JJA median) and αf (Table 2b) to compare with the prescribed Best-1T value. While CTRL–M in London has an αU of 0.09, the UKV–M runs result in bulk albedos of 0.11 (east grid) and 0.12 (west grid; Table 2b). For both UKV–M runs, the over-prediction of the grid box K is notably reduced compared with HAD–B (Fig. 5a). As in both HAD–B and UKV–M (west), the results are dominated by the urban tile(s) (98% and 95% total urban fractions, respectively); the bias reduction seen for the latter can be attributed to the lower αU computed by MORUSES. Differences between the UKV–M east/west results can be explained by the contribution of the inland-water tile in the east-grid run (14%; Table 1a), which has a very low prescribed albedo (0.06 in JULES–UKV). In Swindon, the larger amplitude of K compared with London is better predicted by all models, with HAD–B again having the largest over-prediction attributable to the larger αU (cf. Online Resource Fig. S5a).

In London, Best-1T (CTRL–B, HAD–B) has a notably larger grid box L (Fig. 5b) in spring and summer compared with the observations, with a \(\sim \)1-h delay of the peak and larger amplitudes throughout the afternoon and early night. In Swindon (Fig. 6b), this feature is present in the CTRL–B results (furban = 0.49), but not in HAD–B, for which the urban fraction is much lower (0.23) and the grid box L is dominated by vegetation. MORUSES captures better the phase of the observations, but shows a slight systematic under-prediction of L over 24 h at both sites and for all configurations.

In addition to the surface emissivity (Table 2b), L is controlled by the surface temperature (T), which is affected by the effectiveness of heat storage within the urban tile and hence by the areal heat capacity. For CTRL–B and HAD–B, the bulk urban CU is fixed at 0.28 MJ K− 1 m− 2 (Table 2c), allowing the urban tile to store a large fraction of Q. The radiative coupling to the ground means that the results can also be affected by differences in soil-temperature states between configurations. In London, for example, the HAD–B soil in each layer is colder by \(\sim \)2 K compared with CTRL–B (Online Resource Fig. S8a), which may lead to a reduction of the surface temperature and hence L. Soil temperatures would be reduced further if HAD–B used the region-T land cover (not shown), leading to an improvement in grid box L in MAM and JJA as furban is lowered (Online Resource Fig. S4b). However, the over-prediction in spring/summer persists if QF is accounted for in HAD–B, as illustrated by the CTRL–B results (Fig. 5b).

In MORUSES, the different heat capacities for canyon and roof (Cc, Cf; Table 2c) allow large differences in thermal inertia. The roof areal heat capacity is computed from Cf = cfΔzf. In all MORUSES runs, this is equal to the UKV–M value of \(\sim \)0.04 MJ K− 1 m− 2, resulting from the ‘thin roof’ setting with Δzf = 0.02 m, which is an order of magnitude lower than the computed material thickness of walls and road (\(\varDelta z_{r,w} \propto k^{0.5}_{r,w}\)). The canyon areal heat capacity, as the sum of wall and road components, is directly proportional to the height-to-width ratio: Cc = 2HW− 1cwΔzw + crΔzr (Porson et al. 2010a). The large effect of H/W on Cc is illustrated in the values of the four London/KCL source areas (Fig 4). Cc varies between 0.41 (region C; H/W = 0.36) and 1.18 MJ K− 1 m− 2 (region A; H/W = 1.66; Table 3a, c). For region T, it takes a value of 0.9 MJ K− 1 m− 2 (H/W = 1.18; Table 2a, c). The heat-storage capacity of the MORUSES canyon in London is larger than for the Best-1T urban slab (Table 2c). An approximate bulk CU for MORUSES can be computed from Cc and Cf based on the proportions of roof and canyon, resulting in CU = 0.42 MJ K− 1 m− 2 for CTRL–M (region T), followed by UKV–M (west) and UKV–M (east) with 0.33 and 0.26 MJ K− 1 m− 2, respectively (Table 2c).

As the areal heat capacity increases, the diurnal amplitude of T decreases. As a result, L decreases during the day as more heat is stored in the urban fabric, and increases at night when it is released back into the atmosphere. The comparatively large areal heat capacities of the Best-1T slab and the MORUSES canyon, together with the radiative/conductive coupling to the soil, result in a phase delay of L. However, in MORUSES, this lag is partially compensated by the fast response of the insulated roof to radiative forcing and its contribution to the weighted average of L for the grid box, which contributes to the improved agreement with the observations. The effects of H/W and W/R on MORUSES are further discussed in Section 6.3.

The good match between modelled and observed Q for the MORUSES runs and CTRL–B can be explained partially by compensating errors between K and L (Figs. 5c6c) and the large role of the observed K (model forcing) in Q. The London HAD–B run has the poorest 24-h MAE, and under-predicts Q in spring and summer as a result of the over-prediction of both outgoing radiation components (Fig. 5c).

6.1.2 Turbulent heat fluxes

Modelled turbulent heat fluxes (grid box-averaged) agree better with the observations in suburban Swindon (Fig. 6d,e) compared with central London (Fig. 5d,e), as was found in the SUEWS land-surface model evaluation at these sites (Ward et al. 2016). For both QH and QE, the configuration variability is lower in Swindon as parameter and physics differences of the urban schemes are partially masked by the larger contribution of non-urban surfaces compared with London (Table 1).

In London, QH from CTRL–B and HAD–B has a consistent phase shift of up to \(\sim \)3 h relative to the observations (Fig. 5d). Differences between the two Best-1T runs can be attributed to (i) including realistic QF emissions in CTRL–B rather than assuming QF = 0 W m− 2 as in HAD–B; (ii) the lower bulk albedo in CTRL–B resulting in a larger Q; and (iii) the increased contributions of non-urban tile fluxes to the grid box average in CTRL–B (Table 1a), somewhat reducing the phase delay from the urban tile. The MORUSES runs (CTRL–M, UKV–M) capture the phase of QH better since the large differences between the canyon and roof SEBs are represented. The remaining delay in the timing of the grid box QH for the two UKV–M runs can be attributed to the larger W/R (Tables 2a3a) compared with the sub-sampled CTRL–M results, which are dominated by region A (52.6%) and D (32%). The small MORUSES Cf results in a faster and earlier increase of the roof T compared with the canyon, leading to an earlier phase and increased amplitude of QH. The larger thermal inertia of the canyon reduces the diurnal variability of the canyon temperature and delays the daytime QH peak.

For the London site, the largest relative differences of the peak QH between configurations occur in SON and DJF (Fig. 5d), when anthropogenic heat emissions from space heating are higher. The observed SON/DJF QH does not become negative at night, and is higher in winter than autumn (median of \(\sim \)50 cf. \(\sim \)40 W m− 2). The DJF QH for HAD–B, which neglects QF, is negative over large parts of the day. Using JULES–QF (Fig. 2) in UKV–M improves this, but the results are still closer to HAD–B than to the CTRL runs that use a higher QF representative of central London from GQF. In winter, the monthly median GQF QF (Fig. 2) is larger than the observed median peak Q (Fig. 5c), and hence makes a large contribution to the energy balance. The impact of the representation of QF on the modelled QH is discussed in Section 6.2. Adjusting QF supplied to the CTRL runs by the local urban fraction (cf. Section 3.6), so that it is consistent with the way in which JULES applies it, increases the magnitude of the grid box QH, which in London leads to better daytime results in winter and autumn, but a notable over-prediction at night in all seasons (Online Resource Fig. S1d). The positive nocturnal bias is a result of not using the diurnal cycle of the GQF results, which have much higher QF during the day. The smaller furban in the Swindon grid box makes the QF-scaling effect less prominent (Online Resource Fig. S2d).

In Swindon, QF contributes less to the energy balance and the seasonal variability of model performance is reduced compared to London. HAD–B is the only model configuration consistently underestimating QH. However, if HAD–B has the CTRL land cover, the performance is similar to CTRL–B (Online Resource Fig. S5d), i.e. the difference can be attributed to the \(\sim \)10-km HadGEM3–PRIMAVERA ancillaries with more non-urban surfaces assumed. Hence, compared with CTRL–M/B (furban = 0.49) and UKV–M (0.31; Table 1b), HAD–B has larger QE (Fig. 6e). The JULES–ULSMs over-predict the sensible heat flux at night and do not reproduce the negative night-time QH in Swindon in summer (Fig. 6d), similar to SUEWS (Ward et al. 2016).

At both sites, the ratio of roughness lengths of heat and momentum computed by MORUSES is larger than the default Best 1T value of 10− 7, in the case of CTRL–M at Swindon by almost five orders of magnitude (Table 4), leading to a larger amplitude of QH (cf. Fig. 3 in Best et al. 2006). Variability of the roughness ratio among the MORUSES runs is related to differences in local morphology.

Large seasonal variations of modelled QE in London (Fig. 5e) are related to the presence of the River Thames and urban vegetation (Fig. 4). The CTRL and UKV–M (east) runs have considerable water-tile fractions (Table 1a). This is not the case for UKV–M (west) and HAD–B, for which furban = 0.95 and 0.98, and QE is strongly underestimated during the day. For HAD–B this results in near-zero median QE in all seasons as only rainfall is available for evaporation, but removed quickly from the impervious surface (Section 6.4). Including the river and vegetation in HAD–B using the region T land cover would greatly improve QE (Online Resource Fig. S4e), while in the absence of anthropogenic heat sources QH would be less well predicted (Online Resource Fig. S4d).

Urban-tile evaporation in Best 1T and MORUSES could be enhanced by allowing for some pervious areas, but implications (e.g. regarding soil-moisture transport) are currently unclear, as are implications of prescribing urban irrigation or other anthropogenic moisture sources.

Observations indicate that KCL is not as strongly affected by evaporation from the river as its proximity suggests (Kotthaus and Grimmond 2014b). This is because of the river surface being vertically displaced below the local ground level and hence the effective height of the flux tower being greater relative to the river. Kotthaus and Grimmond (2014b) discuss that for advection over the river (100𝜃 ≤ 230), with a tidal induced variation of the water surface of \(\sim \)6 m, the eddy-covariance sensor heights can be above the river’s growing internal boundary layer. This is most notable for wind directions at right angles to the river, i.e. region C (Fig. 4), whereas for flow from regions B and D evaporation from the river is more likely measured. This feature of the observations was also identified as a main factor contributing to the positive QE bias in SUEWS at KCL (Ward et al. 2016).

The source-area sub-sampling of the CTRL runs results in the grid box QE being mainly affected by the lake tile in summer, leading to a positive offset of the median diurnal cycle and a much larger 75th percentile compared with the observations. Otherwise, the CTRL runs are dominated by evapotranspiration from broadleaf trees (Table 1a). Since the UKV LAI is 0 for all PFTs (Section 3.5), JULES treats the vegetated tiles as bare soil, for which QE shows a pronounced daily cycle. The river QE has a weaker diurnal amplitude. The lower water fraction in UKV–M (west) results in the negative bias relative to UKV–M (east).

The LAI anomalies (Section 3.5) in Swindon affect the tree and shrub tiles in UKV–M, but not grass, which has the highest plan-area fraction (49%; Table 1b). At this site, the CTRL runs with the lowest vegetation fraction have the best agreement of modelled QE with the observations, but all configurations overestimate the latent heat flux in spring and summer (Fig. 6e). The fact that HAD–B in Swindon has the wettest soil (Online Resource Fig. S8b) appears to be a secondary effect to having the largest vegetation fraction when it comes to the over-estimation of the grid box QE. Analysis of the grass-tile QE shows that HAD–B, although it behaves similarly to the much drier UKV–M, has larger amplitudes of QE in summer and autumn (not shown). Prescribing land-cover representative of the source area of the observations in HAD–B in Swindon reduces QE and increases QH, resulting in a skill comparable with CTRL–B (Online Resource Fig. S5d, e). However, climate applications are clearly dependent on appropriate ancillaries.

6.1.3 Diurnal variations of model performance

Time-of-day–dependent model performance (Section 5.2) for the turbulent heat fluxes is evaluated by season using the MAE (Eq. 4; Fig. 7). Corresponding MBE (Eq. 5) metrics are given in Fig. S6 of the Online Resource. At both sites, the largest cross-configuration variability occurs in the morning and afternoon. In Swindon, for all models, the afternoon MAE (Fig. 7b) is larger than for other periods of the day. Here, HAD–B has large |MBE| (Fig. S6b) for both QH and QE in the morning and afternoon. In spring and summer in London, all models except for CTRL–M have a large negative QH bias in the morning (Fig. S6a). This is associated with the characteristic phase delay of the Best-1T model and the lags determined in the diurnal cycles of the UKV–M runs (Fig. 5d). Compared with suburban Swindon, the central London QE bias is quantitatively more similar for different time periods (Fig. S6a).

Fig. 7
figure 7

MAE (W m − 2) of QH and QE for (a) London/KCL and (b) Swindon per season and by time-of-day (Section 5.2) for each model configuration. Dashed lines are used for ease of visualising relations between model configurations. 24-h values (stars) are given in Figs. 5 and 6

The model errors for the two UKV–M configurations in London (east/west grids; Fig. 4) indicate that neither grid characteristics provide a consistently good representation of the actual source-area properties of the observations. The morning and afternoon MAE are reduced for UKV–M (west) as a result of the increased urban tile fraction (Fig. 7a), but the model tends to over-predict the afternoon QH in summer (Fig. S6a). QE, however, has the lowest errors for UKV–M (east) of all the configurations considered.

The transition periods around sunrise and sunset not exhibiting large absolute errors implies that the models respond accurately to changes in radiative forcing. However, relative errors may be large when the fluxes are small.

6.1.4 Performance summary

Taylor diagrams (Taylor 2001) summarise the model performance at London/KCL (Fig. 8a) and Swindon (Fig. 8b). Metrics shown for the entire evaluation period at both sites include the correlation coefficient (Eq. 7), the normalised standard deviation (σP/σO), which is 1 for a perfect prediction, and the normalised centred root-mean-square error (nRMSE; Eq. 6; 0 for a perfect prediction).

Fig. 8
figure 8

Taylor diagram for (a) London/KCL (3-year evaluation period) and (b) Swindon (2-year period) based on normalised standard deviation (σP/σO; thin arcs), normalised centred RMSE (thick arcs) and the correlation coefficient

For all energy-balance components and model configurations, predictions are poorer at the high-rise, high-density site in London, compared with suburban Swindon. At both sites, errors are larger for the turbulent heat fluxes than for the radiation components, with QE showing the largest cross-configuration variability, particularly in London (Fig. 8a), which can be attributed to the geometric complexity of the site and the variations of the prescribed land-cover fractions between configurations (Table 1a). Even after using high-resolution morphology ancillaries, more realistic monthly QF values and source-area sub-sampling, QH and QE in the CTRL–M/B runs still have large uncertainties in central London.

While the HAD–B performance is better for Swindon (Fig. 8b), model errors are consistently larger compared with the other configurations. Comparisons with CTRL–B show that this is in part from the HAD–B land-cover ancillaries. Model skill can also be improved by using suitable bulk parameters (e.g. αU) and, particularly in London, accounting for anthropogenic heat emissions. While the phase lag of the Best-1T scheme at the suburban site is masked by larger non-urban contributions to the grid box QH, the delay strongly affects the performance in highly urbanised environments such as central London.

6.2 Contribution of QF

The seasonal diurnal cycle of QH in London (Fig. 5d) highlights the large impact of anthropogenic heat emissions on the local USEB when contrasting the CTRL runs with UKV–M and HAD–B. To explore the effect within the currently operational JULES–UKV framework, UKV–M (east) is repeated with (i) the QF source switched off (as in HAD–B) and (ii) using monthly median (unscaled) GQF data (Fig. 2).

Switching off QF entirely only slightly decreases the QH model skill compared with the default run using monthly JULES–QF values that are currently prescribed across the UK (Fig. 9). While the quantitative differences are small, qualitatively the effect of using JULES–QF is not negligible as it leads to QH < 0 W m− 2 occurring less frequently in autumn and winter, which in turn affects temperature profiles above the city. Prescribing the GQF values distinctly improves the results (UKV–M GQF; Fig. 9), with the 24-h MAE and MBE values now being comparable with CTRL–M (Fig. 5d).

Fig. 9
figure 9

As Fig. 5d, but for UKV–M (east) with different prescribed monthly QF values: 0 W m− 2 (no QF), default JULES–QF (results as in Fig. 5d) and monthly median (unscaled) GQF values (cf. Fig. 2)

Diurnal and weekday/weekend variations of heat emissions have been removed from the prescribed GQF data by following the current JULES approach of using only monthly values (Section 3.6). At KCL, averaging the hourly GQF output results in smaller (greater) daytime (night-time) QF values. This could explain some of the remaining differences to the observations in autumn and winter (UKV–M GQF, Fig. 9; CTRL–M, Fig. 5d), when QF is both greatest and shows the largest temporal variability (Fig. 2).

6.3 MORUSES morphology controls

In the operational JULES–UKV, the empirical relations (Bohnenstengel et al. 2011; Gilham et al. 2019) used to obtain H, H/W and W/R as a function of furban (Section 3.2) reduce the possible morphology variations from three degrees of freedom to one as the parameters are linked. The Section 6.1 MORUSES results demonstrate the influence of street-canyon morphology (H/W) and roof–canyon partitioning on the grid box-averaged fluxes. While canyon geometry affects the bulk canyon parameters (e.g. αc, Cc), W/R determines the contributions of canyon and roof to the grid box average and, via the Macdonald roughness parameters, the value of zh.

Figure 10 shows QH and thermal inertia, CdT/dt, modelled with MORUSES in response to varying (i) W/R (H/W = 1.18 fixed; Fig. 10a–c) and (ii) H/W (W/R = 0.44 fixed; Fig. 10d–f). The mean building height (H = 19.81 m) is held constant, which conceptually implies that for (i) both H and W remain fixed while the roof length changes, whereas for (ii) both canyon and roof lengths change so that both H and W/R remain constant. The sensitivity test uses UKV–M (London/KCL) for a completely urbanised grid box (furban = 1). The fixed parameter values of H, H/W and W/R are the CTRL values for region T (Fig. 4; Table 2a), while thermal and radiative parameters are the JULES–UKV defaults (Table 2b, c). Results (Fig. 10) are shown for the median JJA diurnal cycles, but are qualitatively similar throughout the year.

Fig. 10
figure 10

Response of the median summer (JJA) diurnal cycles of UKV–M QH and CdT/dt to changes in MORUSES morphology parameters: (ac) canyon fraction (W/R) and (df) height-to-width ratio (H/W), setting furban to 1 at London/KCL. In all cases, H = 19.81 m is held constant. In (ac) H/W is fixed at 1.18, in (df) W/R is 0.44 (CTRL region T values; Table 2a). The respective Best-1T curves for the urban tile (dotted lines with circles) are shown as a reference using CTRL–B parameters (Table 2)

In the sensitivity test, MORUSES computes z0 and zd with the Macdonald method (Section 3.3). While z0 depends on H, H/W and W/R, zd only varies with H and W/R and hence remains constant if H/W is varied (Eqs. 8,9). The MORUSES resistance network depends on H/W, z0 and zd. Therefore, the resultant roughness lengths for heat (zh; Eq. 10) of the canyon and roof tiles change throughout the analysis. For the results in Fig. 10, the corresponding curves for z0, zd and zh for varying W/R (Fig. 12a) and H/W (Fig. 12b) are discussed in Appendix 2.

The contrast between roof and canyon tile fluxes is drastic. With the very small heat capacity of the roof (Table 2c), the corresponding tile QH has much larger amplitudes and peaks earlier in the day (similar in timing to Q; Fig. 5c) compared with the canyon (Fig. 10a,d), but contributes only marginally to the net storage heat flux (Fig. 10c, f).

For a very small canyon fraction (W/R = 0.01; Fig. 10a), the canyon zh defaults to 10− 30 m in this test (Fig. 12a), leading to a very low canyon QH (Fig. 10a). The roughness length defaults to z0m = 0.05 m for W/R ≤ 0.25, while the roof zh increases before it continuously decreases from a maximum at W/R = 0.25 (Fig. 12a), which leads to the initially increasing then decreasing peaks of the roof QH (Fig. 10a). Once the canyon fraction is larger than \(\sim \)0.5, QH and thermal inertia vary very little.

Increasing W/R successively reduces the diurnal amplitude of the grid box QH and delays the timing of the peak values up to \(\sim \)3 h (Fig. 10b) as the contribution of the roof tile to the plan-area averaged fluxes decreases. For small canyon fractions, the grid box results reflect the quick response of the (insulated) MORUSES roof to radiative forcing, resulting in very high/low roof surface temperatures during the day/night. The larger thermal inertia of the canyon (Fig. 10c) means smaller diurnal variations of the canyon surface temperature and hence smaller temperature gradients between canyon and atmosphere. In all test scenarios, the nocturnal roof QH is negative (Fig. 10a,d). For W/R between 0.1 and 0.9, there is a \(\sim \)100 W m− 2 difference in the peak values of the grid box QH and the diurnal amplitude decreases from \(\sim \)275 to \(\sim \)100 W m− 2 (Fig. 10b) as the heat-storage amplitude increases (Fig. 10c). As the canyon fraction increases, MORUSES approaches the Best-1T modelling framework of a single impervious slab with large thermal inertia and the weighted ratio of canyon and roof zh to z0 approaches the Best-1T value of 10− 7 (Fig. 12a).

Varying H/W at a fixed W/R = 0.44 in the isolated-building regime (H/W ≤ 1/3) results in the model defaulting to zh = 0.1z0h for canyon and roof (only at H/W = 0.3; Fig. 12b), which contributes to the non-monotonic behaviour evident in the amplitudes of the tile and grid box QH (Fig. 10d, e). The variation of the grid box QH (Fig. 10e) between H/W = 0.3 (isolated-building regime) and 3.0 (skimming flow) is smaller compared with the response to changes in W/R, with differences in the daytime peaks of \(\sim \)60 W m− 2 and diurnal amplitudes reducing by \(\sim \)100 W m− 2.

The large effect of H/W on the areal heat capacity of the canyon and therefore on its thermal inertia (Fig. 10f) affects both the amplitude and phase of the canyon QH (Fig. 10d). As more heat can be stored with increasing H/W, temperature gradients become smaller during the day and larger at night when the heat is released (CcdT/dt < 0). For H/W = 3.0, which is a typical value for high-rise city centres, the storage heat flux into the canyon has become very large during the day (\(\sim \)290 W m− 2; Fig. 10f) so that QH only starts to rise around noon and reaches its maximum at midnight (Fig. 10d). A similar pattern is present in the canyon QH for region A of the CTRL–M run at KCL (H/W = 1.66; W/R = 0.41; Table 3; not shown). The night-time canyon QH increases by approximately a factor of 2 when increasing H/W from 0.3 to 3 (Fig. 10d). Although the presence of the roof dampens this response, the grid box-averaged QH is still increased noticeably at night (Fig. 10e).

The analysis shows that the availability of accurate urban morphology ancillaries is crucial if MORUSES is to represent the urban energy-balance successfully. Constraining the proportion of the canyon, which can be more easily derived from satellite data, has a larger impact on the results than H/W. Repeating the sensitivity test by allowing H to vary to obtain different W/R or H/W leads to quantitatively similar results (not shown).

6.4 Meteorological controls

The hourly model bias (PiOi) of the turbulent heat fluxes and L from UKV–M in London (east grid) and Swindon is analysed as a function of meteorological forcing, focusing on air temperature (Tair; Fig. 11a, b), horizontal wind speed (Uh; Fig. 11d) and rainfall (hours after rain; Fig. 11e, f). Evaluating the bias with horizontal wind direction (𝜃, Fig. 11c), unaccounted for in the JULES tiling approach, is useful to assess the impact of subgrid-scale land-cover heterogeneity.

Fig. 11
figure 11

Hourly UKV–M model bias (PiOi; scatter coloured by stability parameter ζ = (zzd)/L) of QH for 3 years at London/KCL (east grid, Fig 4) and 2 years in Swindon in response to (a) air temperature (Tair), (c) wind direction 𝜃, (d) wind speed (Uh), (b) model bias of L as a function of Tair and (e, f) model bias of QE depending on hours after rainfall with (f) showing the bias over the initial 12 h. The 1st and 99th percentiles of the hourly biases are shown (dashed lines). Boxplots are for bins of (a, b) 2C, (c) 15, (d) 1 m s− 1, (e) 12 h and (f) 1 h

Atmospheric stability is indicated using the stability parameter ζ = (zzd)/L, where L is the observed Obukhov length and zzd is the effective height of the observations, equivalent to the JULES forcing height (Section 4). Stability bins are selected as follows: |ζ|≤ 0.05 (near neutral), ζ < − 0.05 (unstable), ζ > 0.05 (stable). Over the evaluation periods, stable atmospheric stratification occurs more often in suburban Swindon (15.6%) than in central London (0.9%). In London, the observed prevalent stability condition is unstable (74% versus 16.3% in Swindon), while in Swindon near-neutral conditions prevail (68.1% versus 25.1% in London).

The bias of QH driven by Tair (Fig. 11a) differs between the two sites. The increasing negative model bias with decreasing Tair in central London supports the earlier assessment of the large importance of QF. Once Tair is greater than \({\sim }18^{\circ }C\), the median QH bias is strongly reduced, while the overall scatter increases. For Swindon, where QF is much lower (Fig. 2), the median bias varies little over a wide range of forcing temperatures. At both sites, the statistical uncertainty of the binned bias at relatively high air temperatures (Tair > 25C) is affected by reduced sample sizes (100 > N ≥ 20).

The variation of the L bias with Tair (Fig. 11b) at both sites indicates that the surface is warmer in the observations compared with the model. In Swindon, the bias is relatively uniform across the range of forcing temperatures once Tair > 0C, while in London the magnitude of negative bias increases for Tair > 20C. At KCL, the radiometer source area contains mostly buildings and roads, while in Swindon L is affected both by impervious and vegetated surfaces (Ward et al. 2013). This could indicate that the model bias is of systematic nature.

As expected, given the fairly homogeneous land-cover distribution around the Swindon site (Ward et al. 2013), the QH bias shows very little variation with wind direction (Fig. 11c). In London, however, there is a distinct response to the patchiness of the distribution of buildings, water and vegetation surrounding the site (Fig. 4). The reduction of the negative median QH bias as the flow approaches the flux tower from the river (100𝜃 ≤ 230) can result from (i) the increased importance of the observed QE relative to QH and (ii) decreasing contributions of QF. This indicates that the UKV–M (east) results (Fig. 5) for the given tile distributions (Table 1a) represent the site characteristics best for southerly wind directions, when the observations are less affected by QF and contain signatures of the river, while a fetch over the urbanised north is associated with too low QH.

Evaluating the QH bias as a function of wind speed (Fig. 11d) is useful to identify the relative importance of larger scale horizontal advection of heat and moisture. While advection affects the observed turbulent heat fluxes, it cannot be modelled offline. It is noted that the different wind-speed characteristics in London and Swindon are affected by the measurement height of the observations (Section 4). At both sites, the variation of the median QH with Uh is low over the range of wind speeds observed, while the bias range (between 5th and 95th percentiles) increases after Uh exceeds \(\sim \)4 m s− 1. This is also the case for QE (not shown). This could indicate that for high wind speeds, advection can become important especially if the offline results are evaluated over shorter (non-climatological) periods. Furthermore, under neutral stability conditions (high wind speeds), the flux footprint becomes much larger (Fig. 7 in Kotthaus and Grimmond 2014b), which introduces changes in the observed heat-flux characteristics.

In London, high wind speeds are more likely to occur from the dominant wind-direction sector of 180–225 (covering region B; wind rose in Fig. 4), which is associated with a larger positive QH bias in UKV–M (east) (Fig. 11c) as evaporation from the river is more likely to affect KCL for flow from this direction. The larger bias variability for high wind speeds at KCL (Fig. 11d), in particular the tendency towards an over-prediction of QH (increased 95th percentiles), is likely impacted by these changes in energy partitioning with 𝜃.

Wind speed also affects QH via the MORUSES resistance network. The resistance to scalar transport is inversely proportional to the modelled canyon velocities (Harman et al. 2004). These scale with the velocity at the top of the canyon that is derived by interpolation from the forcing wind speed. Higher wind speeds imply lower resistances and thus higher turbulent fluxes. The interpolation to the canyon top critically depends on zd and z0, which MORUSES estimates from the Macdonald method that can underestimate the surface roughness in heterogeneous building environments (cf. Section 3.3).

To determine the model response to precipitation, the QE bias is evaluated as a function of hours after the last rainfall event (Fig. 11e). For short periods after rain, evaporation from the impervious urban tiles can be expected to make the largest relative contribution to the overall latent heat flux. In London, where impervious surfaces contribute with 79% to the prescribed land cover (UKV–M east; Table 1a), the variability over the first 12 h following rain is large and followed by a negative median bias over the next 12-h interval, indicating that the model represents conditions as drier than observed. This feature is not as clear in Swindon, where furban is only 0.31 (Table 1b) and the vegetation contribution to QE is dominant. When evaluating only the first 12 h after rain (Fig. 11f), at both sites there is an increased positive model bias of QE over the first 1 to 2 h, which could indicate that the urban tiles dry out too quickly.

JULES partitions precipitation into throughfall, interception, runoff and infiltration (Essery et al. 2001). On the urban tile(s), precipitation reaches the surface without interception and the partitioning of rainwater into infiltration and runoff strongly favours the latter. Surface infiltration is curbed by using a low (0.1) attenuation factor for the saturated hydraulic conductivity, which together with the low value of prescribed surface water capacity (0.5 kg m− 2) results in relatively large urban runoff (Eq. 48 in Best et al. 2011). Runoff characteristics of the JULES–UKV (CTRL–M/B, UKV–M) and JULES–GL7.0 (HAD–B) are affected by the use of different models to represent soil-moisture heterogeneity (Appendix 1). While JULES–UKV uses PDM, JULES–GL7.0 uses TOPMODEL (for details see Best et al.2011), which treat the saturation-excess runoff differently. At London/KCL, all configurations generate large runoff rates within a single model time step, which strongly limits evaporation (Online Resource Fig. S7a). The increased vegetation fraction in Swindon reduces the grid box runoff as rain can be intercepted and held by the vegetation canopy (Online Resource Fig. S7b). At both sites, runoff is generated for precipitation rates exceeding \(\sim \)0.9–1.2 mm h− 1 (JULES–UKV) or \(\sim \)1.3–1.7 mm h− 1 (JULES–GL7.0) based on a linear fit of the data.

7 Conclusions

The Best-1T scheme and the 2-tile canopy model MORUSES in JULES are evaluated using multi-year flux observations in the highly urbanised city centre of London and in suburban Swindon, UK. The model performance is studied under different forcing conditions and using different resolution ancillary information (land cover, building morphology). Simulations based on the JULES science configurations and ancillaries used in the UM for regional NWP (JULES–UKV; 1.5 km resolution) and global high-resolution (\(\sim \)10 km) climate simulations (HadGEM; JULES–GL7.0) are compared with control runs that use model parameters derived from observations and high-resolution (\(\sim \)1 m) GIS data and more representative QF. With respect to the study aims, the following conclusions are drawn:

1. Implications of the JULES–ULSM options and resolution/accuracy of ancillaries

  • JULES with MORUSES and Best 1T performs better at the suburban site, where results to a larger degree are determined by the vegetation modelling, compared with the high-rise, high-density site in central London, where physics differences between the ULSMs are more pronounced.

  • At both sites, the outgoing short- and longwave radiation are better represented than the turbulent heat fluxes. The cross-configuration variability of QH and QE is particularly large in London because of the site complexity (and its representation in the ancillaries) and the different ways in which the ULSMs represent the diurnal variability of surface temperatures.

  • For both MORUSES and Best 1T, the prediction of QH in London is poorest in winter and autumn (increasingly negative median bias in UKV–M below Tair ≈ 18C), when the estimated contribution of QF to the local energy balance is of a similar magnitude as Q. The seasonal performance variability is smaller in Swindon, where QF is low year-round. Using a more representative QF in the CTRL runs for central London improves the model skill.

  • Best 1T with the climate-model land cover and default JULES–GL7.0 urban parameters at both sites shows the largest difference to the observations. The HadGEM3–PRIMAVERA land-cover ancillaries overestimate furban in London, resulting in near-zero median QE, while underestimating urban land cover in Swindon (too high QE) at the \(\sim \)10-km grid scale. The relatively high default bulk urban albedo (0.18) and not accounting for QF further contribute to differences from the observed SEB. The characteristic phase delay of QH (negative bias in the morning) and the positive afternoon bias of L are more apparent in environments with high furban like London.

  • The presence of the roof in MORUSES and its fast response to radiative forcing improves the timing of QH. The high thermal inertia of the MORUSES canyon means that the phase and amplitude of the (weighted) grid box QH critically depends on the prescribed plan-area fractions of roof and canyon tiles, which in the UKV are parameterised through furban. Variations of canyon proportion (W/R) have a larger effect on the amplitude of QH than variations of H/W.

  • MORUSES and Best 1T both limit the water-holding capacity on the urban tile(s) so that rainwater is drained too quickly and the tiles dry out fast. This subsequently limits evaporation and, if no pervious surfaces occur in the grid box (e.g. vegetation or water), can result in errors in the energy partitioning.

2. Potential for model improvements to support model outputs being used for integrated urban services

  • Anthropogenic forcing: Providing a more realistic and flexible way of representing anthropogenic heat emissions is essential to improve the performance of the JULES ULSMs, particularly for applications in high-density city centres. This critically needs to include spatial and temporal variations of QF either through ancillaries or (better) through modelling and should allow for the representation of the interplay between human activity and the urban micro-climate (e.g. feedback between the use of space heating/cooling and outdoor air temperatures). Such options are currently being explored. Similarly, anthropogenic emissions of water (e.g. through street cleaning or irrigation of urban vegetation) can be important in some cities (e.g. Ao et al. 2018; Dou et al. 2019), but are currently not modelled in JULES.

  • Land-cover information: Providing accurate land-cover information is crucial for the model performance. In many cities, urban vegetation (street trees, parks) or water bodies make a non-negligible contribution to the local urban climate. Not representing the sub-grid scale land-cover variability in cities in the model ancillaries can lead to an under-representation of the urban QE. In such cases, applications that rely on heat-flux ratios (e.g. Bowen ratio; evaporative fraction) are strongly impacted, which can affect estimates of water requirements for urban vegetation (e.g. irrigation planning) or health-related applications (e.g. heat stress; thermal comfort).

  • Morphology information: The operational UKV MORUSES morphology parameterisations were derived for London. Hence, access to high-resolution building morphology data is necessary if the model is used elsewhere. For UK applications, it also has to be considered that the empirical relations were derived for \(\sim \)1-km NWP applications, and it is unclear whether they are still appropriate at much higher or lower resolutions. Over London, the UKV is currently run routinely at \(\sim \)300-m horizontal resolution, and with even higher resolved simulations being used (\(\sim \)50–100 m; Lean et al. 2019) the derivation of morphology ancillaries should be revisited and evaluated. Using a representative canyon proportion (W/R) in MORUSES can strongly affect model behaviour and hence is important for NWP and climate modelling. However, for climate services, where it may be crucial to represent processes at street-canyon scale, accuracy of parameters linked to urban geometry (e.g. H/W) is required.

  • Aerodynamic roughness: Current MORUSES model formulations effectively restrict the maximum allowed displacement height to the mean building height in the grid box. However, zd can be large in cities with tall buildings and/or a large heterogeneity of building heights. Reflecting the impact of increased drag on flow profiles is crucial for the modelling of heat exchange between surface and atmosphere, which affects the estimation of thermal stress/comfort in urban environments. For applications in high-rise cities, it needs to be explored whether methods that account for building-height variations are more suitable.

  • Urban vegetation modelling: Even small vegetation fractions in cities can strongly affect the energy partitioning (Loridan and Grimmond 2012a; 2012b), and accurate information on LAI and its seasonal variation is crucial to model evaporation correctly. Urban vegetation and water can be included by assuming a certain amount of both are present in the source urban data when it is processed into the JULES land-cover ancillaries. The urban vegetation and water are then represented although aggregated within the other (non-urban) JULES surface types in the grid box. Representing urban vegetation as separated from the USEB in JULES removes important process interactions, such as the impact of street trees on canyon aerodynamics (e.g. increased surface roughness; flow-regime changes), radiation balance (e.g. more shading; reduced sky-view factor; impact on bulk canyon albedo/emissivity) and heat-flux partitioning. These could be important in scenario studies, for example related to urban greening strategies as mitigation measures for thermal stress. Grimmond et al. (2011) found that ULSM performance of turbulent heat fluxes has a larger variability if vegetation is treated on separate tiles compared with using an integrated approach. The potential of further model improvements by including vegetation in MORUSES should be explored (Young et al. 2015; Young 2018).

The study documented the temporal (seasonal, diurnal) variability of model skill and the necessity to evaluate ULSMs over a range of meteorological conditions. Contrasting the effect of different resolution ancillaries on the results showed that for integrated urban services that are informed by climate-model output an appropriate down-scaler should be used with suitable resolution land-cover information rather than using the output from the driving model directly. Hence, the configuration chosen needs to be appropriate for the purpose of the task. It also needs to be explored how the offline results will translate to coupled (online) simulations of UM–JULES over similar (multi-year) evaluation periods. There is also further need to evaluate the models in other high-density cities, in particular with tall-building canopies, and, once current modelling capabilities have been expanded, for different seasonality and types of anthropogenic forcing.