Urban signals in high-resolution weather and climate simulations: role of urban land-surface characterisation

Two urban schemes within the Joint UK Land Environment Simulator (JULES) are evaluated offline against multi-year flux observations in the densely built-up city centre of London and in suburban Swindon (UK): (i) the 1-tile slab model, used in climate simulations; (ii) the 2-tile canopy model MORUSES (Met Office–Reading Urban Surface Exchange Scheme), used for numerical weather prediction over the UK. Offline, both models perform better at the suburban site, where differences between the urban schemes are less pronounced due to larger vegetation fractions. At both sites, the outgoing short- and longwave radiation is more accurately represented than the turbulent heat fluxes. The seasonal variations of model skill are large in London, where the sensible heat flux in autumn and winter is strongly under-predicted if the large city centre magnitudes of anthropogenic heat emissions are not represented. The delayed timing of the sensible heat flux in the 1-tile model in London results in large negative bias in the morning. The partitioning of the urban surface into canyon and roof in MORUSES improves this as the roof tile is modelled with a very low thermal inertia, but phase and amplitude of the grid box-averaged flux critically depend on accurate knowledge of the plan-area fractions of streets and buildings. Not representing non-urban land cover (e.g. vegetation, inland water) in London results in severely under-predicted latent heat fluxes. Control runs demonstrate that the skill of both models can be greatly improved by providing accurate land cover and morphology information and using representative anthropogenic heat emissions, which is essential if the model output is intended to inform integrated urban services.


Introduction
As urbanisation levels and urban populations continue to grow (United Nations 2018), there is an increasing need for urban climate services (Baklanov et al. 2018). Climate models need to represent future extremes correctly, such as the intensity and frequency of heat waves, droughts or NWP resolution. To capture the impact of cities on surface-atmosphere interactions, urban land-surface models (ULSM) are used within NWP and climate models. ULSMs provide a solution to the urban surface-energy balance (USEB; e.g. Oke et al. 2017): where Q * = (K ↓ − K ↑ ) + (L ↓ − L ↑ ) is the net allwave radiation composed of the incoming (↓) and outgoing (↑) shortwave (K) and longwave (L) radiation components. Q F is the anthropogenic heat flux, and Q H and Q E the turbulent sensible and latent heat fluxes. ΔQ S is the net storage heat flux that includes the storage of heat within the urban fabric (thermal inertia, CdT * /dt, where C is the areal capacity and T * the surface temperature; Porson et al. 2010a), and the ground-heat flux. Net horizontal advection of heat or moisture is included on the right side of Eq. 1 if the ULSM is coupled to an atmospheric model.

Representation of urban processes
Urban processes are represented in atmospheric models with varying levels of morphological and physical complexity depending on computational resources and resolution ( Fig. 1). ULSMs used in NWP or climate simulations parameterise the effects of urban environments on surface fluxes without representing buildings explicitly. Numerous ULSMs exist (Grimmond et al. 2009) using different approaches to model sub-grid scale processes (Grimmond et al. 2010;Grimmond et al. 2011). With the 'tile' approach, urban areas are treated separately and in parallel with other surfaces in the grid box (e.g. vegetation or water). Simpler modelling approaches modify vegetation schemes (e.g. Best 2005) by using bulk surface parameters that are representative of impervious, urbanised surfaces (e.g. increased heat capacity and surface roughness). More complex models use parameterisations of processes in simple street canyons, for which morphology parameters can vary spatially (e.g. Masson 2000; Martilli et al. 2002;Porson et al. 2010a).
A key problem at this scale is the description of the subgrid-scale heterogeneity of the urban surface in the models .
Increasing the level of detail with which urban processes are modelled becomes important as grid resolutions approach the urban grey-zone regime (O(10 − 100 m)) both in terms of physics (turbulent exchange is now partially resolved) and buildings (morphology variations from one grid box to the next can be significant). Neighbourhood to building-scale simulations feature building-dominant σ H , standard deviation of building heights; W , street-canyon width; z 0 , roughness length for momentum; λ p , plan-area fraction; λ f , frontal-area fraction processes that need to be directly resolved and coupling of indoor-outdoor environments becomes possible.

Focus of this study
The choice of ULSM for a specific application is often constrained by practical considerations (e.g. Best 2006a), for example the computational resources available. With increasing model complexity, the demands on spatially varying ancillaries increase. Land-cover and building morphology/material information or population density and energy-use data to represent anthropogenic heat emissions may not be readily available at the spatial/temporal scales of interest.
The Met Office Unified Model (UM) is routinely run on regional to global scales in both climate mode (HadGEM) and NWP setups using different complexity ULSMs available in the Joint UK Land Environment Simulator (JULES; Best et al. 2011;Clark et al. 2011). The UM-JULES modelling suite has many international users beyond the UK, e.g. meteorological services in Australia, Singapore and South Korea, and an increasing demand for its application over highly urbanised areas of the world such as China and India. While HadGEM simulations currently use the simple one-tile (1T) urban scheme in JULES (Best 2005), since March 2016 the regional UKV NWP model (1.5-km horizontal resolution over the UK) is run operationally with the urban-canopy model MORUSES (Met Office-Reading Urban Surface Exchange Scheme; Porson et al. 2010aPorson et al. , 2010bBohnengstengel and Hendry 2016).
In this study, we focus on the following questions: 1. What are the implications of ULSM options, together with resolution and accuracy of ancillary information, in current UM-JULES modelling environments used for NWP and climate simulations? 2. How can current urban modelling capabilities in UM-JULES be improved, especially with regard to using the model output for integrated urban services?
To address these questions, the JULES ULSMs are evaluated offline against multi-year flux observations in the UK in two different urban settings: (i) the high-density, high-rise city centre of London (King's College London Strand campus; KCL); and (ii) the suburban town of Swindon, located ∼115 km west of London. The offline runs use the JULES configurations of (i) the operational UKV NWP setup and (ii) Global Land 7.0 (GL7.0; Walters et al. 2019; Wiltshire et al. 2020) used in HadGEM3 climate simulations. This is the first time that the JULES ULSMs are evaluated together over multi-year periods, allowing the study of seasonal variations in model performance under a range of different conditions.

Urban models in JULES
JULES is a community land-surface model that uses a tile approach to represent the subgrid-scale heterogeneity of land cover ). This allows no interaction between tiles below the first atmospheric model level, where the total (grid box-averaged) surface fluxes are obtained by weighting the tile variables by their plan-area fractions (Best et al. 2006).
Urban areas are represented in JULES either using the (Best 2005) single urban tile scheme (Best 1T) or using a 2tile (2T) approach that splits the impervious urban surface into a roof and a street canyon. For the 2T approach, JULES has two options: (i) a simple 2T slab scheme (Best 2T;Best et al. 2006) and (ii) MORUSES (Porson et al. 2010a;2010b).

Representation of the USEB
The Best 1T and Best 2T approaches solve the SEB of a modified vegetation canopy that interacts with the first soil level via longwave radiative coupling. Urban bulk parameters for the canopy (Best 1T) or the roof/canyon combination (Best 2T) are prescribed (urban albedo α U , urban emissivity ε U , bulk areal heat capacity C U , roughness length for momentum z 0 and the ratio z h /z 0 , where z h is the roughness length for heat). The maximum available surface water mass and infiltration rates are reduced to limit the evaporation over the impervious tile (Best 2005). In standard UM-JULES configurations, urban parameters for the Best models are the same for each city and across a city. While this is not an intrinsic model limitation, allowing for spatial variations of model parameters would require significant code changes in JULES.
MORUSES includes several modifications of the above: (i) bulk parameters for the street-canyon tile (canyon albedo α c , emissivity ε c and areal heat capacity C c ) are calculated from canyon geometry and prescribed material properties for roads and walls (material albedo α, emissivity ε, volumetric heat capacity c, thermal diffusivity k); (ii) z 0 and z h are calculated explicitly (Section 3.3); and (iii) spatially varying urban morphology data can be provided . The presence of a street canyon (building height H , street width W ) in MORUSES allows modelling of short-and longwave radiative trapping. Turbulent scalar fluxes are computed through a resistance network that responds to canyon geometry, modified from Harman et al. (2004). Three flow regimes are used: (i) isolated-building regime (H/W ≤ 1/3), (ii) wake interference (1/3 < H/W ≤ 2/3) and (iii) skimming flow (H/W > 2/3). The net storage heat flux (ΔQ S ) depends on thermal material properties and canyon geometry (Porson et al. 2010a). The canyon/roof tiles are conductively/radiatively coupled to the ground. For the canyon tile, only the road facet is coupled to the soil (not the walls). The insulation behaviour of the roof tile can be controlled via a prescribed thermal diffusivity or by using a very small roof thickness, Δz f (Porson et al. 2010b).

Treatment of urban vegetation
UM-JULES typically uses five plant functional types (PFT) in separate tiles: broadleaf tree, needleleaf tree, C3 grass, C4 grass, shrubs. Urban vegetation is modelled using these vegetation tiles and not as part of the urban SEB. As a result, processes in built and vegetated areas do not interact below the first model level and Q E only originates from urban tiles after rainfall. Shortcomings of this approach have been discussed by Grimmond et al. (2011) and Grimmond (2012a, 2012b).
Prescribing urban anthropogenic moisture sources (e.g. irrigation) in JULES is not possible currently (Best and Grimmond 2016). While this is not critical for the evaluation sites considered in this study, effects of urban irrigation can be important in some cities (e.g. Ao et al. 2018;Dou et al. 2019).

Anthropogenic heat emissions
Currently, JULES (vn5.4) has two options for the emission of Q F over the urban tile(s): (i) effectively none (0 W m −2 ) and (ii) fixed values (JULES-Q F ) derived from the Digest of United Kingdom Energy Statistics (DUKES 2003). The latter is used operationally in the UKV and represents a UKaveraged monthly mean Q F for 1995-2003 adjusted to the fraction dissipated over impervious/built areas. Consistent with this, when Q F is added to the SEB in JULES it is only applied to the urban tile(s) and set to zero elsewhere. Hence, there is spatial variability at the grid scale as it is proportional to the urban fraction in the grid box. However, there are no local adjustments that account for differences between city centres and suburban areas, which arise from changes in building type and volume, population density and traffic volume (e.g. Sailor 2011;Iamarino et al. 2012). The minimum (unscaled; i.e. 100% urban) monthly JULES Q F for the UK occurs in August (∼17 W m −2 ), the maximum in December (∼26 W m −2 ; Fig. 2).   Ward et al. 2013). To compare with JULES, the hourly Q F data for London and Swindon are aggregated into monthly medians. JULES -Q F is shown both in its unscaled (100% urban) version and scaled to the urban tile proportions at the two sites (66% for London; 49% for Swindon; Table 1). The scaled values are representative of those used in the simulation.
In MORUSES, Q F is either released only in the canyon or is evenly distributed among canyon and roof. The latter is the UKV default. The prescribed monthly values are linearly interpolated to the model time step, but diurnally constant. Hence, variations of Q F related to sleep/work patterns, different energy demands on weekdays/weekends, public holidays or as a result of daylight-saving time, dayto-day or temperature conditions (Grimmond 1992;Sailor and Vasireddy 2006;Gabey et al. 2019) are not reflected. Using a fixed daily value can result in too large/low Q F releases during the night/day (Grimmond et al. 2010).

Offline JULES-USLM configurations
JULES is run offline over 3 years (January 2011-December 2013) at an hourly time step using surface observations in central London and suburban Swindon as forcing data (Section 4; Table 5a). A modified version of JULES vn5.2 is  Wiltshire et al. 2020) and ∼10 km resolution ancillaries of the N1280 HadGEM3-PRIMAVERA GCM system; using Best 1T (hereafter labelled: HAD-B) 2. UKV: Operational JULES-UKV science configuration (OS40) and 1.5-km resolution UKV ancillaries (OS41); using MORUSES (UKV-M) 3. Control: Operational JULES-UKV (OS40) together with high-resolution (1 m) land cover, building morphology and roughness data and the JULES leafphenology model (Cox 2001;Clark et al. 2011); using MORUSES (CTRL-M) and Best 1T (CTRL-B) It is noted that while the CTRL-B runs use JULES-UKV, the SEB statistics remain mostly unchanged if JULES-GL7.0 is used with urban parameters and Q F options kept the same (cf. seasonal diurnal statistics in the Online Resource Figs. S4, S5). Small differences stem from the hydrology and inland-water parameters (Appendix 1). Hence, 'CTRL-B v. HAD-B' (as 'CTRL-M v. UKV-M') differences are not from the model science, but reflect the impact of ULSM parameter choices, treatment of Q F and differences in prescribed land cover.

Land-cover information
The operational UKV land-cover ancillaries are based on ∼25-m resolution 1990 UK land-cover data (CEH 1990). The HadGEM3-PRIMAVERA ancillaries are derived from global ∼1-km resolution data of the International Geosphere-Biosphere Project (IGBP; reference years 1995/1996; Walters et al. 2019, and references therein). Figure 3a shows an urban versus non-urban binary land-cover classification over the south-east UK from the Global Urban Footprint (GUF) project that used TerraSAR-X/TanDEM-X radar data (∼12-m resolution, reference year 2011; Esch et al. 2011;Esch et al. 2017;Esch et al. 2018).
The spatial representativeness of tile fractions for UKV-M and HAD-B varies with grid resolution of the atmospheric model (1.5 km and 10 km, respectively). The fraction of urban land cover (f urban ) in the UKV (Fig. 3b) and HadGEM3 (Fig. 3c) varies between the Greater London area and the surrounding urban regions. The coarser resolution HadGEM3-PRIMAVERA ancillaries (Fig. 3c) may make the outskirts or small urban areas (like Swindon) appear more vegetated. Urban fractions for Greater London, on the other hand, are over-estimated compared with the CEH-based UKV ancillaries ( Fig. 3b; cf. Bohnenstengel and Hendry 2016). Highly urban coastal land tiles (Fig. 3c) are likely artefacts from the land-sea mask overlapping with only a small fraction of land with large f urban . Affected grid cells are mostly sea, so the effect on the grid box results is small. The land cover for the HAD-B and UKV-M runs is extracted from grid boxes containing the observation sites. In the HadGEM3 ancillaries, Swindon is represented by a single urbanised grid box (f urban = 0.23, Table 1b). The KCL site in central London is located in a grid box with a very high urban fraction (f urban = 0.98; Fig. 3c). In the UKV ancillaries, KCL sits at the border of two grid boxes (UKV west/east; Fig. 4) with quite different land cover (Table 1a). UKV-M at KCL is run for both grid boxes as the resolution accuracy of the computational mesh (O(10 m)) means that the site can be situated in either.
Land-cover and building-morphology parameters for the CTRL runs are for a 500-m radius area around each site, covering the source areas of the turbulent fluxes and radiation instruments (Ward et al. 2013;Kotthaus and Grimmond 2014b). For Swindon, the CTRL runs use the land cover reported by Ward et al. (2013) and Ward et al. (2016). For London/KCL, the land cover is derived from high-resolution (1m) GIS data using the Urban Multi-scale Environmental Predictor (UMEP; Lindberg et al. 2018). At this site, the land cover varies strongly by wind direction. The River Thames dominates the area to the south, whereas the north is densely built-up (Fig. 4). For the CTRL runs, the total 500-m radius area (region T) is used, together with four additional segments (regions A, B, C and D; Fig. 4) that are sub-sampled based on wind direction for the evaluation of the turbulent heat fluxes (Section 5.1). At Swindon, there is slightly more vegetation to the north-east and slightly more impervious surfaces to the south-west (Fig. 2 in Ward et al. 2013), but land-cover variations are not as marked as in London.

Building morphology
The input parameters required for MORUSES are as follows: mean building height (H ), canyon aspect ratio (H/W ) and canyon-width ratio (W/R), where W is the street-canyon width and R is the combined length of canyon and roof. The ratios can be related to the plan and frontalarea fractions (λ p , λ f ; Porson et al. 2010a) via: where λ p is the proportion of the urban tile occupied by buildings (i.e. the roof fraction), so that W/R is the fraction of non-building impervious (street canyon) surface.
The operational UKV morphology ancillaries are generated from empirical polynomial relations derived from high-resolution (O(1 m)) 3D London morphology data (Virtual London dataset; Evans et al. 2006) that form the basis of the parametrisations of λ p , λ f  and H (Gilham et al. 2019) with the urban fraction in the grid box, f urban . Given the ∼1-km resolution for NWP, a 50-m radius smoothing filter removed small-scale land-cover variations (details in Bohnenstengel et al. 2011). As the empirical relations were only derived for one city, their use elsewhere should be assessed or the MORUSES morphology parameters should be derived from local data.
For the CTRL runs in London, H , λ p and λ f are derived directly from 1-m resolution GIS data. For consistency, Eq. 3 is used to obtain H/W despite UMEP being able to derive the aspect ratio directly (Lindberg et al. 2018). For the Swindon site, the morphology and roughness characteristics reported by Kent et al. (2018) are used.

Roughness parameters
Roughness and morphology parameters for the offline runs are summarised in Tables 2a, 3a and 4. HAD-B uses the Fig. 4 Land-cover distribution in central London around the KCL site (red dot), with wind rose (January 2011-December 2013; inset) and the UKV west/east grid boxes. The 500-m radius area (large circle; region T) contains the 80% source area of turbulent heat fluxes for the site (neutral and unstable conditions). Labels (A-D) indicate the four sectors defined to sub-sample the CTRL runs. CTRL-T is for the complete circle default Best-1T z 0 of 1 m. In all runs with the Best-1T model, z h /z 0 uses the fixed default value of 10 −7 (Best et al. 2006).
In MORUSES, z h is calculated from the bulk resistances to scalar transport of the roof and canyon tiles (Porson et al. 2010a, and Eq. 10 in Appendix 2). The canyon resistances vary in response to H/W (Section 2.1). In the isolatedbuilding flow regime (H/W ≤ 1/3), the upper limit of z h in MORUSES is fixed to 0.1z 0m , where z 0m = 0.05 m is the prescribed material roughness length for momentum.
The resistance network is driven by the flow at the roof and canyon top, obtained by extrapolating from the model-forcing level without correcting for stability effects and assuming that the logarithmic layer extends down to the building top (Harman et al. 2004). The roughness parameters, z d and z 0 , needed for this calculation can be obtained from the Macdonald et al. (1998) morphometric method available in MORUSES (Eqs. 8 and 9; Appendix 2) or can be prescribed. For the CTRL runs, the Macdonald z 0 and z d are calculated externally and then prescribed. Value differences between UKV-M and CTRL (Table 2a) are caused by the differences in resolution and method of derivation of the underlying morphology data.
Modifications to the Macdonald method to account for building-height variability, like the Kanda et al. (2013) approach, have been shown to provide better predictions of wind speeds above cities (Kent et al. 2017;Kent et al. 2018a). While in the Macdonald approach z d scales with H , Kanda's modification accounts for the maximum building  (Table 2a).
Being able to reflect the impact of increased drag on the flow is crucial if the ULSM is applied in environments with tall buildings and/or a large heterogeneity of building heights. Although the Macdonald model available in MORUSES effectively limits the displacement height such that z d ∈ [0, H ] (Eq. 8), the facility exists to supply other data as ancillaries. Conceptually, JULES estimates the surface exchange at a height of z d + z 0 , representing an effective surface in the model. This means that the effective height of the street canyon in MORUSES is H − z d − z 0 . If H ≤ z d + z 0 , as can be the case in high-rise city centres, the velocity at the canyon top currently defaults to zero. MORUSES requires further development in order to make use of parameterisations that enter this regime. For the roof tile, the criterion is slightly less strict with 1.1H ≤ z d + z 0 .

Radiation and thermal controls
Urban radiative and thermal parameters used in this study are summarised in Table 2b and c. UKV-M prescribes the same radiative and thermal urban fabric properties across UK cities, assuming asphalt roads, brick walls and clay roofs (Porson et al. 2010a;Bohnenstengel et al. 2011). In London, the CTRL-M radiative parameters are adjusted to Table 2 Overview of parameters used for JULES (Best 1T and MORUSES) Table 3. UKV-M parameters at KCL are for west and east grids ('L W ', 'L E '). For sources of the data, see text   provide the best match to the observed bulk parameters reported by Kotthaus and Grimmond (2014a). For CTRL-M at Swindon, the default UKV parameters are representative of typical building materials at this site (Ward et al. 2013).
In MORUSES, the 'thin-roof' setting is used (UKV default), by which insulation is modelled through a very Table 4 Ratio of roughness lengths for heat and momentum for the MORUSES runs at (a) London/KCL and (b) Swindon

Vegetation controls
LAI and canopy height (H c ; m) for UKV-M and HAD-B are based on the UKV and HadGEM3-PRIMAVERA ancillaries that vary spatially and by month. In this data source, LAI is effectively zero for all PFTs in highly urbanised grid boxes such as central London, although vegetation is present (Fig. 4). For UKV-M at Swindon, this also affects the tree and shrub tiles. The LAI ancillaries are based on the relatively coarse IGBP land-cover mapping (Fig. 3c), which for the UKV can lead to inconsistencies with the CEH land cover. The canopy height, however, is non-zero so that roughness effects of the vegetation are modelled.
In case of near-zero LAI, JULES treats the vegetated tiles effectively as bare-soil surfaces with zero evapotranspiration and minimised water-holding and interception capacity . In this study, the affected vegetation fractions are low (0.01-0.12; Table 1), but even small amounts of vegetation can have a strong impact on the surface-energy partitioning in urban areas (Grimmond et al. 2010;Loridan and Grimmond 2012a;2012b).
Another feature shared by the UKV and HadGEM3-PRIMAVERA ancillaries over the southern UK is an anomalous rise in LAI in December/January following a minimum in November. This affects all PFTs and stems from the underlying MODIS Terra 4-km LAI dataset (collection 5; 2005-2009). The newer MODIS (collection 6) products do not show this artefact over the same analysis period.
The CTRL runs use the JULES leaf-phenology model (Cox 2001;Clark et al. 2011) to simulate leaf-on/leaf-off periods in response to temperature changes and the UKV H c ancillaries are prescribed.

Anthropogenic heat flux
The CTRL runs use the monthly median GQF results for London/KCL and inventory emissions for Swindon (Fig. 2), both with and without scaling to the urban fraction (cf. Section 2.3). The former option is consistent with the way in which the value is used within JULES and the latter option results in an underestimation of the heat supplied to the system with respect to the GQF and inventory emission values. This is because JULES assumes that Q F is released only on the urban tile(s); thus, the grid value is correspondingly reduced. The GQF and inventory data, however, are for the complete area. Figures S1 and S2 in the supplementary Online Resource show results from using the scaled values of Q F , consistent with the assumptions within JULES. CTRL-B in London (Fig. S1d) has a reduced Q H MBE over all seasons and CTRL-M/B in DJF both benefit from the increased magnitude of Q F when compared with the results using the unscaled values (Section 6.1.2; Fig. 5d), but the scaling leads to larger errors overall. However, there is a large uncertainty in the estimation of Q F and it is unclear what a representative magnitude of Q F ideally should be for the sites. Therefore, we discuss the CTRL runs with the unscaled Q F in Section 6.
In UKV-M, the default JULES-Q F values are used (Fig. 2), and Q F is neglected in HAD-B, reflecting the JULES settings for the UKV and GL7.0.

Model spin-up and output
For each of the four JULES soil layers (0.1, 0.25, 0.65 and 2.0 m depths), soil temperature and soil wetness (mass of soil water as a fraction of water content at saturation) are initially prescribed. The run is considered 'spun-up' if the change in soil-moisture content (kg m −2 ) is below 1% and the change in soil temperature below 0.2 K over a spin-up cycle. The first year of forcing data is used for the spin-up.
The soil states reached after the successful spin-up completion differ in response to the JULES configuration and land-cover ancillaries (cf. Appendix 3). The differences in soil states are maintained (qualitatively) throughout the evaluation period, indicating another difference between experimental setups. However, repeating the CTRL-M/B simulations with swapped initial conditions in a sensitivity test, in London results in negligible differences between the turbulent heat-flux statistics after the first 4 months or so of the simulation. Whereas in the more vegetated Swindon, differences persist over the first year with the largest impact in summer.
Results from the offline runs are a sample at the model time step (1 h). Seasonal and diurnal statistics are computed in a post-processing step, in which the model output is analysed only when observations are available (Section 4).

Observations
For model forcing and evaluation hourly averages measured in central London (KCL; 51.511 • N, 0.117 • W) and Swindon (51.585 • N, 1.798 • W) are used. The London/KCL data comprise measurements from two sites on the same roof (horizontal distance < 45 m; vertical offset 1.4 m), which operated consecutively during the study period (Kotthaus and Grimmond 2014a, b;Bjorkegren et al. 2015). Following Ward et al. (2016), we define the measurement height as the height of the wind-speed measurements above ground level (agl), which is 10.6 m in Swindon and 49.6 m at KCL. The latter is the average height of both KCL sites.
The forcing height in JULES is the height of the instruments above the local z d (Table 2a; KCL z d for region T). For KCL, this is 36.4 m; for Swindon 8.6 m. At both sites, the JULES forcing heights for temperature/humidity (z T ,q ) and wind (z u,v ) are equal.
Table 5 (Appendix 4) gives an overview of observed variables and instrumentation. Observed forcing variables (Table 5a) are K ↓ , L ↓ , air temperature (T air ), rainfall rate (RR), wind speed (U h ), specific humidity (q) and atmospheric pressure (P ). Periods of snowfall and settled snow are insignificant at both sites during the evaluation periods (Ward et al. 2016). The forcing data are gap-filled observations used in the offline evaluation of the SUEWS land-surface model for 01 January 2011-31 December 2013 (Ward et al. 2016).
The evaluation focuses on the outgoing radiation components (K ↑ , L ↑ ) and the net all-wave radiation (Q * ), measured with radiometers, and the turbulent heat fluxes (Q H , Q E ) derived from eddy-covariance measurements (Table 5b). Evaluation data are available at Swindon over a 2-year period (09 May 2011-25 April 2013) and in London for the entire 3-year simulation span. Occasional data gaps within these periods generally are of short duration. Detailed descriptions of site characteristics, instrumentation, data processing techniques and analyses are provided by Kotthaus and Grimmond (2014a, b) and Bjorkegren et al. (2015) for London/KCL and Ward et al. (2013) for Swindon.

Source-area sub-sampling
The JULES tiling scheme does not account for the distribution of land-cover features within a grid box. In order to reflect the difference in the eddy-covariance source-area characteristics around London/KCL (Fig. 4), the CTRL-M and CTRL-B results are created by subsampling the grid box Q H and Q E of the individual CTRL runs for sectors A to D based on the observed horizontal wind direction (θ ). For the 3-year study period, 52.6% of samples fall into region A, 9.6% in B, 5.8% in C and 32% in D (cf. wind rose, Fig. 4).
The source area of the radiometers is independent of θ , but shadowing fractions vary by time of day and seasonally. In London, the runs for region T are compared with the radiation observations.

Evaluation metrics
To quantify the model performance, predicted (P i ) and observed (O i ) variables are compared in terms of evaluation metrics. These include the mean absolute error: the mean bias error: the normalised centred root-mean-square error: and the correlation coefficient (Pearson's R): where N is the sample size and σ O and σ P are the standard deviations of observations and predictions, respectively.

Seasonal-diurnal USEB variability
SEB components are analysed in terms of median diurnal cycles of grid box averages (i.e. averaged over all tiles) in central London ( Fig. 5; 3-year period) and suburban Swindon ( Fig. 6; 2-year period) in winter (DJF), spring (MAM), summer (JJA) and autumn (SON). Differences in the JULES-ULSMs performance are overall more pronounced for London, as in Swindon the urban plan-area fraction is a much smaller part of the grid box and therefore the contribution to the fluxes is lower (Table 1). Hence, the Swindon results are to a much larger degree determined by processes in the vegetated canopies.

Radiation and storage-heat flux
The energy available for partitioning into the turbulent heat fluxes is determined from Q * + Q F − ΔQ S (Eq. 1).
As Q F is prescribed, first it is important to evaluate the outgoing radiation (K ↑ , L ↑ ) and storage-heat flux. While the radiometer at London/KCL sees almost exclusively roofs, urban canyons and some trees, in Swindon K ↑ and L ↑ are to a large degree affected by vegetation.
Differences in prescribed bulk and material albedos (Table 2b) impact K ↑ (Figs. 5a, 6a). At both sites, HAD-B consistently shows the largest deviation from the observations. This can partly be attributed to the comparatively large default urban albedo used in JULES-GL7.0 (α U = 0.18). The different land-cover makeup for HAD-B compared with the observed source area also contributes to the differences in the grid box K ↑ , particularly in London through a larger contribution from the urban tile (f urban = 0.98). However, even when prescribing land cover that is representative of the radiometer source area (as in CTRL), the HAD-B error remains larger compared with the other runs (cf. Online Resource Figs. S4a, S5a), highlighting the impact of α U . CTRL-B (Figs. 5a, 6a) uses the observed α U of 0.11. In London, this lower bulk urban albedo together with the addition of 21% inland water (α = 0.06) in CTRL-B (region T, Table 1a) greatly improves the agreement with the observations over 24 h (low MAE and |MBE|).  Table 2 for run parameters  S3b,c). α c reported in Table 2b is the JJA median at solar noon, when the albedo has the greatest impact as K ↓ is generally at its peak. However, especially for the runs using the UKV-M material albedos, α c shows strong seasonaldiurnal variability (Online Resource Fig. S3b,c). In London, CTRL-M (region T) computes a very low JJA canyon albedo of 0.04 (much lower than the roof albedo of 0.13) that varies only little by season or time of day. Using the default UKV material albedos, CTRL-M in Swindon and UKV-M at both sites have the same JJA α c of 0.08. However, the differences between α c for UKV-M east/west are larger in other seasons (cf. Online Resource Fig. S3b) as trapping effects in MORUSES are more pronounced when the sun is high, and hence are stronger in summer than in winter and stronger for larger H/W .
A bulk urban albedo, α U , for the canyon-roof combination can be obtained as the weighted average of α c (here the JJA median) and α f (Table 2b) to compare with the prescribed Best-1T value. While CTRL-M in London has an α U of 0.09, the UKV-M runs result in bulk albedos of 0.11 (east grid) and 0.12 (west grid; Table 2b). For both UKV-M runs, the over-prediction of the grid box K ↑ is notably reduced compared with HAD-B (Fig. 5a). As in both HAD-B and UKV-M (west), the results are dominated by the urban tile(s) (98% and 95% total urban fractions, respectively); the bias reduction seen for the latter can be attributed to the lower α U computed by MORUSES. Differences between the UKV-M east/west results can be explained by the contribution of the inland-water tile in the east-grid run (14% ; Table 1a), which has a very low prescribed albedo (0.06 in JULES-UKV). In Swindon, the larger amplitude of K ↑ compared with London is better predicted by all models, with HAD-B again having the largest over-prediction attributable to the larger α U (cf. Online Resource Fig. S5a).
In London, Best-1T (CTRL-B, HAD-B) has a notably larger grid box L ↑ (Fig. 5b) in spring and summer compared with the observations, with a ∼1-h delay of the peak and larger amplitudes throughout the afternoon and early night. In Swindon (Fig. 6b), this feature is present in the CTRL-B results (f urban = 0.49), but not in HAD-B, for which the urban fraction is much lower (0.23) and the grid box L ↑ is dominated by vegetation. MORUSES captures better the phase of the observations, but shows a slight systematic under-prediction of L ↑ over 24 h at both sites and for all configurations.
In addition to the surface emissivity (Table 2b), L ↑ is controlled by the surface temperature (T * ), which is affected by the effectiveness of heat storage within the urban tile and hence by the areal heat capacity. For CTRL-B and HAD-B, the bulk urban C U is fixed at 0.28 MJ K −1 m −2 (Table 2c), allowing the urban tile to store a large fraction of Q * . The radiative coupling to the ground means that the results can also be affected by differences in soil-temperature states between configurations. In London, for example, the HAD-B soil in each layer is colder by ∼2 K compared with CTRL-B (Online Resource Fig. S8a), which may lead to a reduction of the surface temperature and hence L ↑ . Soil temperatures would be reduced further if HAD-B used the region-T land cover (not shown), leading to an improvement in grid box L ↑ in MAM and JJA as f urban is lowered (Online Resource Fig. S4b). However, the over-prediction in spring/summer persists if Q F is accounted for in HAD-B, as illustrated by the CTRL-B results (Fig. 5b).
In MORUSES, the different heat capacities for canyon and roof (C c , C f ; Table 2c) allow large differences in thermal inertia. The roof areal heat capacity is computed from C f = c f Δz f . In all MORUSES runs, this is equal to the UKV-M value of ∼0.04 MJ K −1 m −2 , resulting from the 'thin roof' setting with Δz f = 0.02 m, which is an order of magnitude lower than the computed material thickness of walls and road (Δz r,w ∝ k 0.5 r,w ). The canyon areal heat capacity, as the sum of wall and road components, is directly proportional to the height-to-width ratio: C c = 2H W −1 c w Δz w + c r Δz r (Porson et al. 2010a). The large effect of H/W on C c is illustrated in the values of the four London/KCL source areas (Fig 4). C c varies between 0.41 (region C; H/W = 0.36) and 1.18 MJ K −1 m −2 (region A; H/W = 1.66; Table 3a, c). For region T, it takes a value of 0.9 MJ K −1 m −2 (H/W = 1.18; Table 2a, c). The heat-storage capacity of the MORUSES canyon in London is larger than for the Best-1T urban slab (Table 2c). An approximate bulk C U for MORUSES can be computed from C c and C f based on the proportions of roof and canyon, resulting in C U = 0.42 MJ K −1 m −2 for CTRL-M (region T), followed by UKV-M (west) and UKV-M (east) with 0.33 and 0.26 MJ K −1 m −2 , respectively (Table 2c).
As the areal heat capacity increases, the diurnal amplitude of T * decreases. As a result, L ↑ decreases during the day as more heat is stored in the urban fabric, and increases at night when it is released back into the atmosphere. The comparatively large areal heat capacities of the Best-1T slab and the MORUSES canyon, together with the radiative/conductive coupling to the soil, result in a phase delay of L ↑ . However, in MORUSES, this lag is partially compensated by the fast response of the insulated roof to radiative forcing and its contribution to the weighted average of L ↑ for the grid box, which contributes to the improved agreement with the observations. The effects of H/W and W/R on MORUSES are further discussed in Section 6.3.
The good match between modelled and observed Q * for the MORUSES runs and CTRL-B can be explained partially by compensating errors between K ↑ and L ↑ (Figs. 5c, 6c) and the large role of the observed K ↓ (model forcing) in Q * . The London HAD-B run has the poorest 24-h MAE, and under-predicts Q * in spring and summer as a result of the over-prediction of both outgoing radiation components (Fig. 5c).

Turbulent heat fluxes
Modelled turbulent heat fluxes (grid box-averaged) agree better with the observations in suburban Swindon (Fig. 6d,e) compared with central London (Fig. 5d,e), as was found in the SUEWS land-surface model evaluation at these sites (Ward et al. 2016). For both Q H and Q E , the configuration variability is lower in Swindon as parameter and physics differences of the urban schemes are partially masked by the larger contribution of non-urban surfaces compared with London (Table 1).
In London, Q H from CTRL-B and HAD-B has a consistent phase shift of up to ∼3 h relative to the observations (Fig. 5d). Differences between the two Best-1T runs can be attributed to (i) including realistic Q F emissions in CTRL-B rather than assuming Q F = 0 W m −2 as in HAD-B; (ii) the lower bulk albedo in CTRL-B resulting in a larger Q * ; and (iii) the increased contributions of non-urban tile fluxes to the grid box average in CTRL-B (Table 1a), somewhat reducing the phase delay from the urban tile. The MORUSES runs (CTRL-M, UKV-M) capture the phase of Q H better since the large differences between the canyon and roof SEBs are represented. The remaining delay in the timing of the grid box Q H for the two UKV-M runs can be attributed to the larger W/R (Tables 2a, 3a) compared with the sub-sampled CTRL-M results, which are dominated by region A (52.6%) and D (32%). The small MORUSES C f results in a faster and earlier increase of the roof T * compared with the canyon, leading to an earlier phase and increased amplitude of Q H . The larger thermal inertia of the canyon reduces the diurnal variability of the canyon temperature and delays the daytime Q H peak.
For the London site, the largest relative differences of the peak Q H between configurations occur in SON and DJF (Fig. 5d), when anthropogenic heat emissions from space heating are higher. The observed SON/DJF Q H does not become negative at night, and is higher in winter than autumn (median of ∼50 cf. ∼40 W m −2 ). The DJF Q H for HAD-B, which neglects Q F , is negative over large parts of the day. Using JULES-Q F (Fig. 2) in UKV-M improves this, but the results are still closer to HAD-B than to the CTRL runs that use a higher Q F representative of central London from GQF. In winter, the monthly median GQF Q F (Fig. 2) is larger than the observed median peak Q * (Fig. 5c), and hence makes a large contribution to the energy balance. The impact of the representation of Q F on the modelled Q H is discussed in Section 6.2. Adjusting Q F supplied to the CTRL runs by the local urban fraction (cf. Section 3.6), so that it is consistent with the way in which JULES applies it, increases the magnitude of the grid box Q H , which in London leads to better daytime results in winter and autumn, but a notable over-prediction at night in all seasons (Online Resource Fig. S1d). The positive nocturnal bias is a result of not using the diurnal cycle of the GQF results, which have much higher Q F during the day. The smaller f urban in the Swindon grid box makes the Q F -scaling effect less prominent (Online Resource Fig. S2d).
In Swindon, Q F contributes less to the energy balance and the seasonal variability of model performance is reduced compared to London. HAD-B is the only model configuration consistently underestimating Q H . However, if HAD-B has the CTRL land cover, the performance is similar to CTRL-B (Online Resource Fig. S5d), i.e. the difference can be attributed to the ∼10-km HadGEM3-PRIMAVERA ancillaries with more non-urban surfaces assumed. Hence, compared with CTRL-M/B (f urban = 0.49) and UKV-M (0.31; Table 1b), HAD-B has larger Q E (Fig. 6e). The JULES-ULSMs over-predict the sensible heat flux at night and do not reproduce the negative nighttime Q H in Swindon in summer (Fig. 6d), similar to SUEWS (Ward et al. 2016).
At both sites, the ratio of roughness lengths of heat and momentum computed by MORUSES is larger than the default Best 1T value of 10 −7 , in the case of CTRL-M at Swindon by almost five orders of magnitude (Table 4), leading to a larger amplitude of Q H (cf. Fig. 3 in Best et al. 2006). Variability of the roughness ratio among the MORUSES runs is related to differences in local morphology.
Large seasonal variations of modelled Q E in London (Fig. 5e) are related to the presence of the River Thames and urban vegetation (Fig. 4). The CTRL and UKV-M (east) runs have considerable water-tile fractions (Table 1a). This is not the case for UKV-M (west) and HAD-B, for which f urban = 0.95 and 0.98, and Q E is strongly underestimated during the day. For HAD-B this results in near-zero median Q E in all seasons as only rainfall is available for evaporation, but removed quickly from the impervious surface (Section 6.4). Including the river and vegetation in HAD-B using the region T land cover would greatly improve Q E (Online Resource Fig. S4e), while in the absence of anthropogenic heat sources Q H would be less well predicted (Online Resource Fig. S4d).
Urban-tile evaporation in Best 1T and MORUSES could be enhanced by allowing for some pervious areas, but implications (e.g. regarding soil-moisture transport) are currently unclear, as are implications of prescribing urban irrigation or other anthropogenic moisture sources.
Observations indicate that KCL is not as strongly affected by evaporation from the river as its proximity suggests (Kotthaus and Grimmond 2014b). This is because of the river surface being vertically displaced below the local ground level and hence the effective height of the flux tower being greater relative to the river. Kotthaus and Grimmond (2014b) discuss that for advection over the river (100 • ≤ θ ≤ 230 • ), with a tidal induced variation of the water surface of ∼6 m, the eddy-covariance sensor heights can be above the river's growing internal boundary layer. This is most notable for wind directions at right angles to the river, i.e. region C (Fig. 4), whereas for flow from regions B and D evaporation from the river is more likely measured. This feature of the observations was also identified as a main factor contributing to the positive Q E bias in SUEWS at KCL (Ward et al. 2016).
The source-area sub-sampling of the CTRL runs results in the grid box Q E being mainly affected by the lake tile in summer, leading to a positive offset of the median diurnal cycle and a much larger 75th percentile compared with the observations. Otherwise, the CTRL runs are dominated by evapotranspiration from broadleaf trees (Table 1a). Since the UKV LAI is 0 for all PFTs (Section 3.5), JULES treats the vegetated tiles as bare soil, for which Q E shows a pronounced daily cycle. The river Q E has a weaker diurnal amplitude. The lower water fraction in UKV-M (west) results in the negative bias relative to UKV-M (east).
The LAI anomalies (Section 3.5) in Swindon affect the tree and shrub tiles in UKV-M, but not grass, which has the highest plan-area fraction (49% ; Table 1b). At this site, the CTRL runs with the lowest vegetation fraction have the best agreement of modelled Q E with the observations, but all configurations overestimate the latent heat flux in spring and summer (Fig. 6e). The fact that HAD-B in Swindon has the wettest soil (Online Resource Fig. S8b) appears to be a secondary effect to having the largest vegetation fraction when it comes to the over-estimation of the grid box Q E . Analysis of the grass-tile Q E shows that HAD-B, although it behaves similarly to the much drier UKV-M, has larger amplitudes of Q E in summer and autumn (not shown). Prescribing land-cover representative of the source area of the observations in HAD-B in Swindon reduces Q E and increases Q H , resulting in a skill comparable with CTRL-B (Online Resource Fig. S5d, e). However, climate applications are clearly dependent on appropriate ancillaries.

Diurnal variations of model performance
Time-of-day-dependent model performance (Section 5.2) for the turbulent heat fluxes is evaluated by season using the MAE (Eq. 4; Fig. 7). Corresponding MBE (Eq. 5) metrics are given in Fig. S6 of the Online Resource. At both sites, the largest cross-configuration variability occurs in the morning and afternoon. In Swindon, for all models, the afternoon MAE (Fig. 7b) is larger than for other periods of the day. Here, HAD-B has large |MBE| (Fig. S6b) for both Q H and Q E in the morning and afternoon. In spring and summer in London, all models except for CTRL-M have a large negative Q H bias in the morning (Fig. S6a). This is associated with the characteristic phase delay of the Best-1T model and the lags determined in the diurnal cycles of the UKV-M runs (Fig. 5d). Compared with suburban Swindon, the central London Q E bias is quantitatively more similar for different time periods (Fig. S6a).
The model errors for the two UKV-M configurations in London (east/west grids; Fig. 4) indicate that neither grid characteristics provide a consistently good representation of the actual source-area properties of the observations. The morning and afternoon MAE are reduced for UKV-M (west) as a result of the increased urban tile fraction (Fig. 7a), but the model tends to over-predict the afternoon Q H in summer (Fig. S6a). Q E , however, has the lowest errors for UKV-M (east) of all the configurations considered.
The transition periods around sunrise and sunset not exhibiting large absolute errors implies that the models respond accurately to changes in radiative forcing. However, relative errors may be large when the fluxes are small.

Performance summary
Taylor diagrams (Taylor 2001) summarise the model performance at London/KCL (Fig. 8a) and Swindon (Fig. 8b). Metrics shown for the entire evaluation period at both sites include the correlation coefficient (Eq. 7), the normalised standard deviation (σ P /σ O ), which is 1 for a perfect prediction, and the normalised centred root-meansquare error (nRMSE; Eq. 6; 0 for a perfect prediction).
For all energy-balance components and model configurations, predictions are poorer at the high-rise, high-density site in London, compared with suburban Swindon. At both sites, errors are larger for the turbulent heat fluxes than for the radiation components, with Q E showing the largest cross-configuration variability, particularly in London (Fig. 8a), which can be attributed to the geometric complexity of the site and the variations of the prescribed land-cover fractions between configurations (Table 1a). Even after using high-resolution morphology ancillaries, more realistic monthly Q F values and source-area subsampling, Q H and Q E in the CTRL-M/B runs still have large uncertainties in central London.
While the HAD-B performance is better for Swindon (Fig. 8b), model errors are consistently larger compared with the other configurations. Comparisons with CTRL-B show that this is in part from the HAD-B landcover ancillaries. Model skill can also be improved by using suitable bulk parameters (e.g. α U ) and, particularly in London, accounting for anthropogenic heat emissions. While the phase lag of the Best-1T scheme at the suburban site is masked by larger non-urban contributions to the grid box Q H , the delay strongly affects the performance in highly urbanised environments such as central London.

Contribution of Q F
The seasonal diurnal cycle of Q H in London (Fig. 5d) highlights the large impact of anthropogenic heat emissions on the local USEB when contrasting the CTRL runs with UKV-M and HAD-B. To explore the effect within the currently operational JULES-UKV framework, UKV-M (east) is repeated with (i) the Q F source switched off (as in HAD-B) and (ii) using monthly median (unscaled) GQF data (Fig. 2).
Switching off Q F entirely only slightly decreases the Q H model skill compared with the default run using monthly JULES-Q F values that are currently prescribed across the UK (Fig. 9). While the quantitative differences are small, qualitatively the effect of using JULES-Q F is not negligible as it leads to Q H < 0 W m −2 occurring less frequently in autumn and winter, which in turn affects temperature profiles above the city. Prescribing the GQF values distinctly improves the results (UKV-M GQF; Fig. 9), with the 24-h MAE and MBE values now being comparable with CTRL-M (Fig. 5d).
Diurnal and weekday/weekend variations of heat emissions have been removed from the prescribed GQF data by following the current JULES approach of using only monthly values (Section 3.6). At KCL, averaging the hourly GQF output results in smaller (greater) daytime (night-time) Q F values. This could explain some of the remaining differences to the observations in autumn and winter (UKV-M GQF, Fig. 9; CTRL-M, Fig. 5d), when Q F is both greatest and shows the largest temporal variability (Fig. 2).

MORUSES morphology controls
In the operational JULES-UKV, the empirical relations Gilham et al. 2019) used to obtain H , H/W and W/R as a function of f urban (Section 3.2) reduce the possible morphology variations from three degrees of freedom to one as the parameters are linked. The Section 6.1 MORUSES results demonstrate the influence of street-canyon morphology (H/W ) and roof-canyon partitioning on the grid box-averaged fluxes. While canyon geometry affects the bulk canyon parameters (e.g. α c , C c ), W/R determines the contributions of canyon and roof to the grid box average and, via the Macdonald roughness parameters, the value of z h . Figure 10 shows Q H and thermal inertia, C dT * /dt, modelled with MORUSES in response to varying (i) W/R (H/W = 1.18 fixed; Fig. 10a-c) and (ii) H/W (W/R = 0.44 fixed; Fig. 10d-f). The mean building height (H = 19.81 m) is held constant, which conceptually implies that for (i) both H and W remain fixed while the roof length changes, whereas for (ii) both canyon and roof lengths change so that both H and W/R remain constant.
The sensitivity test uses UKV-M (London/KCL) for a completely urbanised grid box (f urban = 1). The fixed parameter values of H , H/W and W/R are the CTRL values for region T ( Fig. 4; Table 2a), while thermal and radiative parameters are the JULES-UKV defaults (Table 2b, c). Results (Fig. 10) are shown for the median JJA diurnal cycles, but are qualitatively similar throughout the year.
In the sensitivity test, MORUSES computes z 0 and z d with the Macdonald method (Section 3.3). While z 0 depends on H , H/W and W/R, z d only varies with H and W/R and hence remains constant if H/W is varied (Eqs. 8,9). The MORUSES resistance network depends on H/W , z 0 and z d . Therefore, the resultant roughness lengths for heat (z h ; Eq. 10) of the canyon and roof tiles change throughout the analysis. For the results in Fig. 10, the corresponding curves for z 0 , z d and z h for varying W/R (Fig. 12a) and H/W (Fig. 12b) are discussed in Appendix 2.
The contrast between roof and canyon tile fluxes is drastic. With the very small heat capacity of the roof (Table 2c), the corresponding tile Q H has much larger amplitudes and peaks earlier in the day (similar in timing  Table 2a). The respective Best-1T curves for the urban tile (dotted lines with circles) are shown as a reference using CTRL-B parameters (Table 2) to Q * ; Fig. 5c) compared with the canyon (Fig. 10a,d), but contributes only marginally to the net storage heat flux (Fig. 10c, f).
For a very small canyon fraction (W/R = 0.01; Fig. 10a), the canyon z h defaults to 10 −30 m in this test (Fig. 12a), leading to a very low canyon Q H (Fig. 10a). The roughness length defaults to z 0m = 0.05 m for W/R ≤ 0.25, while the roof z h increases before it continuously decreases from a maximum at W/R = 0.25 (Fig. 12a), which leads to the initially increasing then decreasing peaks of the roof Q H (Fig. 10a). Once the canyon fraction is larger than ∼0.5, Q H and thermal inertia vary very little.
Increasing W/R successively reduces the diurnal amplitude of the grid box Q H and delays the timing of the peak values up to ∼3 h (Fig. 10b) as the contribution of the roof tile to the plan-area averaged fluxes decreases. For small canyon fractions, the grid box results reflect the quick response of the (insulated) MORUSES roof to radiative forcing, resulting in very high/low roof surface temperatures during the day/night. The larger thermal inertia of the canyon (Fig. 10c) means smaller diurnal variations of the canyon surface temperature and hence smaller temperature gradients between canyon and atmosphere. In all test scenarios, the nocturnal roof Q H is negative (Fig. 10a,d). For W/R between 0.1 and 0.9, there is a ∼100 W m −2 difference in the peak values of the grid box Q H and the diurnal amplitude decreases from ∼275 to ∼100 W m −2 (Fig. 10b) as the heat-storage amplitude increases (Fig. 10c). As the canyon fraction increases, MORUSES approaches the Best-1T modelling framework of a single impervious slab with large thermal inertia and the weighted ratio of canyon and roof z h to z 0 approaches the Best-1T value of 10 −7 (Fig. 12a).
Varying H/W at a fixed W/R = 0.44 in the isolatedbuilding regime (H/W ≤ 1/3) results in the model defaulting to z h = 0.1z 0h for canyon and roof (only at H/W = 0.3; Fig. 12b), which contributes to the nonmonotonic behaviour evident in the amplitudes of the tile and grid box Q H (Fig. 10d, e). The variation of the grid box Q H (Fig. 10e) between H/W = 0.3 (isolated-building regime) and 3.0 (skimming flow) is smaller compared with the response to changes in W/R, with differences in the daytime peaks of ∼60 W m −2 and diurnal amplitudes reducing by ∼100 W m −2 .
The large effect of H/W on the areal heat capacity of the canyon and therefore on its thermal inertia (Fig. 10f) affects both the amplitude and phase of the canyon Q H (Fig. 10d). As more heat can be stored with increasing H/W , temperature gradients become smaller during the day and larger at night when the heat is released (C c dT * /dt < 0). For H/W = 3.0, which is a typical value for highrise city centres, the storage heat flux into the canyon has become very large during the day (∼290 W m −2 ; Fig. 10f) so that Q H only starts to rise around noon and reaches its maximum at midnight (Fig. 10d). A similar pattern is present in the canyon Q H for region A of the CTRL-M run at KCL (H/W = 1.66; W/R = 0.41; Table 3; not shown). The night-time canyon Q H increases by approximately a factor of 2 when increasing H/W from 0.3 to 3 (Fig. 10d). Although the presence of the roof dampens this response, the grid box-averaged Q H is still increased noticeably at night (Fig. 10e).
The analysis shows that the availability of accurate urban morphology ancillaries is crucial if MORUSES is to represent the urban energy-balance successfully. Constraining the proportion of the canyon, which can be more easily derived from satellite data, has a larger impact on the results than H/W . Repeating the sensitivity test by allowing H to vary to obtain different W/R or H/W leads to quantitatively similar results (not shown).

Meteorological controls
The hourly model bias (P i − O i ) of the turbulent heat fluxes and L ↑ from UKV-M in London (east grid) and Swindon is analysed as a function of meteorological forcing, focusing on air temperature (T air ; Fig. 11a, b), horizontal wind speed (U h ; Fig. 11d) and rainfall (hours after rain; Fig. 11e, f). Evaluating the bias with horizontal wind direction (θ , Fig. 11c), unaccounted for in the JULES tiling approach, is useful to assess the impact of subgrid-scale land-cover heterogeneity.
Atmospheric stability is indicated using the stability parameter ζ = (z − z d )/L, where L is the observed Obukhov length and z − z d is the effective height of the observations, equivalent to the JULES forcing height (Section 4). Stability bins are selected as follows: |ζ | ≤ 0.05 (near neutral), ζ < −0.05 (unstable), ζ > 0.05 (stable). Over the evaluation periods, stable atmospheric stratification occurs more often in suburban Swindon (15.6%) than in central London (0.9%). In London, the observed prevalent stability condition is unstable (74% versus 16.3% in Swindon), while in Swindon near-neutral conditions prevail (68.1% versus 25.1% in London).
The bias of Q H driven by T air (Fig. 11a) differs between the two sites. The increasing negative model bias with decreasing T air in central London supports the earlier assessment of the large importance of Q F . Once T air is greater than ∼18 • C, the median Q H bias is strongly reduced, while the overall scatter increases. For Swindon, where Q F is much lower (Fig. 2), the median bias varies little over a wide range of forcing temperatures. At both sites, the statistical uncertainty of the binned bias at relatively high air temperatures (T air > 25 • C) is affected by reduced sample sizes (100 > N ≥ 20).
The variation of the L ↑ bias with T air (Fig. 11b) at both sites indicates that the surface is warmer in the observations compared with the model. In Swindon, the bias is relatively uniform across the range of forcing temperatures once T air > 0 • C, while in London the magnitude of negative bias increases for T air > 20 • C. At KCL, the radiometer source area contains mostly buildings and roads, while in Swindon L ↑ is affected both by impervious and vegetated surfaces (Ward et al. 2013). This could indicate that the model bias is of systematic nature.
As expected, given the fairly homogeneous land-cover distribution around the Swindon site (Ward et al. 2013), the Q H bias shows very little variation with wind direction (Fig. 11c). In London, however, there is a distinct response to the patchiness of the distribution of buildings, water and vegetation surrounding the site (Fig. 4). The reduction of the negative median Q H bias as the flow approaches the flux tower from the river (100 • ≤ θ ≤ 230 • ) can result from (i) the increased importance of the observed Q E relative to Q H and (ii) decreasing contributions of Q F . This indicates that the UKV-M (east) results (Fig. 5) for the given tile distributions (Table 1a) represent the site characteristics best for southerly wind directions, when the observations are less affected by Q F and contain signatures of the river, while a fetch over the urbanised north is associated with too low Q H .
Evaluating the Q H bias as a function of wind speed (Fig. 11d) is useful to identify the relative importance of larger scale horizontal advection of heat and moisture. While advection affects the observed turbulent heat fluxes, it cannot be modelled offline. It is noted that the different wind-speed characteristics in London and Swindon are affected by the measurement height of the observations (Section 4). At both sites, the variation of the median Q H with U h is low over the range of wind speeds observed, while the bias range (between 5th and 95th percentiles) increases after U h exceeds ∼4 m s −1 . This is also the case for Q E (not shown). This could indicate that for high wind speeds, advection can become important especially if the offline results are evaluated over shorter (non-climatological) periods. Furthermore, under neutral stability conditions (high wind speeds), the flux footprint becomes much larger (Fig. 7 in Kotthaus and Grimmond 2014b), which introduces changes in the observed heat-flux characteristics.
In London, high wind speeds are more likely to occur from the dominant wind-direction sector of 180-225 • (covering region B; wind rose in Fig. 4), which is associated with a larger positive Q H bias in UKV-M (east) (Fig. 11c) as evaporation from the river is more likely to affect KCL for flow from this direction. The larger bias variability for high wind speeds at KCL (Fig. 11d), in particular the tendency towards an over-prediction of Q H (increased 95th percentiles), is likely impacted by these changes in energy partitioning with θ . Wind speed also affects Q H via the MORUSES resistance network. The resistance to scalar transport is inversely proportional to the modelled canyon velocities (Harman et al. 2004). These scale with the velocity at the top of the canyon that is derived by interpolation from the forcing wind speed. Higher wind speeds imply lower resistances and thus higher turbulent fluxes. The interpolation to the canyon top critically depends on z d and z 0 , which MORUSES estimates from the Macdonald method that can underestimate the surface roughness in heterogeneous building environments (cf. Section 3.3).
To determine the model response to precipitation, the Q E bias is evaluated as a function of hours after the last rainfall event (Fig. 11e). For short periods after rain, evaporation from the impervious urban tiles can be expected to make the largest relative contribution to the overall latent heat flux. In London, where impervious surfaces contribute with 79% to the prescribed land cover (UKV-M east; Table 1a), the variability over the first 12 h following rain is large and followed by a negative median bias over the next 12-h interval, indicating that the model represents conditions as drier than observed. This feature is not as clear in Swindon, where f urban is only 0.31 (Table 1b) and the vegetation contribution to Q E is dominant. When evaluating only the first 12 h after rain (Fig. 11f), at both sites there is an increased positive model bias of Q E over the first 1 to 2 h, which could indicate that the urban tiles dry out too quickly.
JULES partitions precipitation into throughfall, interception, runoff and infiltration (Essery et al. 2001). On the urban tile(s), precipitation reaches the surface without interception and the partitioning of rainwater into infiltration and runoff strongly favours the latter. Surface infiltration is curbed by using a low (0.1) attenuation factor for the saturated hydraulic conductivity, which together with the low value of prescribed surface water capacity (0.5 kg m −2 ) results in relatively large urban runoff (Eq. 48 in Best et al. 2011). Runoff characteristics of the JULES-UKV (CTRL-M/B, UKV-M) and JULES-GL7.0 (HAD-B) are affected by the use of different models to represent soil-moisture heterogeneity (Appendix 1). While JULES-UKV uses PDM, JULES-GL7.0 uses TOPMODEL (for details see Best et al. 2011), which treat the saturation-excess runoff differently. At London/KCL, all configurations generate large runoff rates within a single model time step, which strongly limits evaporation (Online Resource Fig. S7a). The increased vegetation fraction in Swindon reduces the grid box runoff as rain can be intercepted and held by the vegetation canopy (Online Resource Fig. S7b). At both sites, runoff is generated for precipitation rates exceeding ∼0.9-1.2 mm h −1 (JULES-UKV) or ∼1.3-1.7 mm h −1 (JULES-GL7.0) based on a linear fit of the data.

Conclusions
The Best-1T scheme and the 2-tile canopy model MORUSES in JULES are evaluated using multi-year flux observations in the highly urbanised city centre of London and in suburban Swindon, UK. The model performance is studied under different forcing conditions and using different resolution ancillary information (land cover, building morphology). Simulations based on the JULES science configurations and ancillaries used in the UM for regional NWP (JULES-UKV; 1.5 km resolution) and global high-resolution (∼10 km) climate simulations (HadGEM; JULES-GL7.0) are compared with control runs that use model parameters derived from observations and highresolution (∼1 m) GIS data and more representative Q F . With respect to the study aims, the following conclusions are drawn: 1. Implications of the JULES-ULSM options and resolution/accuracy of ancillaries 2. Potential for model improvements to support model outputs being used for integrated urban services • Anthropogenic forcing: Providing a more realistic and flexible way of representing anthropogenic heat emissions is essential to improve the performance of the JULES ULSMs, particularly for applications in highdensity city centres. This critically needs to include spatial and temporal variations of Q F either through ancillaries or (better) through modelling and should allow for the representation of the interplay between human activity and the urban micro-climate (e.g. feedback between the use of space heating/cooling and outdoor air temperatures). Such options are currently being explored. Similarly, anthropogenic emissions of water (e.g. through street cleaning or irrigation of urban vegetation) can be important in some cities (e.g. Ao et al. 2018;Dou et al. 2019), but are currently not modelled in JULES. • Land-cover information: Providing accurate land-cover information is crucial for the model performance. In many cities, urban vegetation (street trees, parks) or water bodies make a non-negligible contribution to the local urban climate. Not representing the subgrid scale land-cover variability in cities in the model ancillaries can lead to an under-representation of the urban Q E . In such cases, applications that rely on heat-flux ratios (e.g. Bowen ratio; evaporative fraction) are strongly impacted, which can affect estimates of water requirements for urban vegetation (e.g. irrigation planning) or health-related applications (e.g. heat stress; thermal comfort). • Morphology information: The operational UKV MORUSES morphology parameterisations were derived for London. Hence, access to high-resolution building morphology data is necessary if the model is used elsewhere. For UK applications, it also has to be considered that the empirical relations were derived for ∼1-km NWP applications, and it is unclear whether they are still appropriate at much higher or lower resolutions. Over London, the UKV is currently run routinely at ∼300-m horizontal resolution, and with even higher resolved simulations being used (∼50-100 m; Lean et al. 2019) the derivation of morphology ancillaries should be revisited and evaluated. Using a representative canyon proportion (W/R) in MORUSES can strongly affect model behaviour and hence is important for NWP and climate modelling. However, for climate services, where it may be crucial to represent processes at street-canyon scale, accuracy of parameters linked to urban geometry (e.g. H/W ) is required. • Aerodynamic roughness: Current MORUSES model formulations effectively restrict the maximum allowed displacement height to the mean building height in the grid box. However, z d can be large in cities with tall buildings and/or a large heterogeneity of building heights. Reflecting the impact of increased drag on flow profiles is crucial for the modelling of heat exchange between surface and atmosphere, which affects the estimation of thermal stress/comfort in urban environments. For applications in high-rise cities, it needs to be explored whether methods that account for building-height variations are more suitable. • Urban vegetation modelling: Even small vegetation fractions in cities can strongly affect the energy partitioning (Loridan and Grimmond 2012a;2012b), and accurate information on LAI and its seasonal variation is crucial to model evaporation correctly. Urban vegetation and water can be included by assuming a certain amount of both are present in the source urban data when it is processed into the JULES land-cover ancillaries. The urban vegetation and water are then represented although aggregated within the other (non-urban) JULES surface types in the grid box. Representing urban vegetation as separated from the USEB in JULES removes important process interactions, such as the impact of street trees on canyon aerodynamics (e.g. increased surface roughness; flowregime changes), radiation balance (e.g. more shading; reduced sky-view factor; impact on bulk canyon albedo/emissivity) and heat-flux partitioning. These could be important in scenario studies, for example related to urban greening strategies as mitigation measures for thermal stress. Grimmond et al. (2011) found that ULSM performance of turbulent heat fluxes has a larger variability if vegetation is treated on separate tiles compared with using an integrated approach. The potential of further model improvements by including vegetation in MORUSES should be explored (Young et al. 2015;Young 2018).
The study documented the temporal (seasonal, diurnal) variability of model skill and the necessity to evaluate ULSMs over a range of meteorological conditions. Contrasting the effect of different resolution ancillaries on the results showed that for integrated urban services that are informed by climate-model output an appropriate downscaler should be used with suitable resolution land-cover information rather than using the output from the driving model directly. Hence, the configuration chosen needs to be appropriate for the purpose of the task. It also needs to be explored how the offline results will translate to coupled (online) simulations of UM-JULES over similar (multi-year) evaluation periods. There is also further need to evaluate the models in other high-density cities, in particular with tall-building canopies, and, once current modelling capabilities have been expanded, for different seasonality and types of anthropogenic forcing.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons. org/licenses/by/4.0/.  Table 5 Overview of (a) forcing and (b) evaluation variables observed in London ('L'; Kotthaus and Grimmond 2014a;Kotthaus and Grimmond 2014b;Bjorkegren et al. 2015) and Swindon ('S'; Ward et al. 2013