Anthropogenic heat flux: advisable spatial resolutions when input data are scarce

Anthropogenic heat flux (QF) may be significant in cities, especially under low solar irradiance and at night. It is of interest to many practitioners including meteorologists, city planners and climatologists. QF estimates at fine temporal and spatial resolution can be derived from models that use varying amounts of empirical data. This study compares simple and detailed models in a European megacity (London) at 500 m spatial resolution. The simple model (LQF) uses spatially resolved population data and national energy statistics. The detailed model (GQF) additionally uses local energy, road network and workday population data. The Fractions Skill Score (FSS) and bias are used to rate the skill with which the simple model reproduces the spatial patterns and magnitudes of QF, and its sub-components, from the detailed model. LQF skill was consistently good across 90% of the city, away from the centre and major roads. The remaining 10% contained elevated emissions and “hot spots” representing 30–40% of the total city-wide energy. This structure was lost because it requires workday population, spatially resolved building energy consumption and/or road network data. Daily total building and traffic energy consumption estimates from national data were within ± 40% of local values. Progressively coarser spatial resolutions to 5 km improved skill for total QF, but important features (hot spots, transport network) were lost at all resolutions when residential population controlled spatial variations. The results demonstrate that simple QF models should be applied with conservative spatial resolution in cities that, like London, exhibit time-varying energy use patterns.


Introduction
Anthropogenic heat flux (Q F ) can be a substantial input into the urban energy balance, especially in mid-and high-latitude cities during the autumn, winter and early spring when it matches or even exceeds the incoming radiant flux in some areas of a city (e.g. Kikegawa et al. 2003;Hamilton et al. 2009). Despite its magnitude, Q F is difficult to measure directly because it is incorporated into measured sensible, latent, storage and long-wave radiative fluxes. Q F is therefore commonly estimated using empirical models (e.g. Sailor and Lu 2004;Smith et al. 2009;Allen et al. 2011;Iamarino et al. 2012) that simulate the metabolic (Q F,M ), transportation (Q F,T ) and building (Q F,B ) components of Q F .
Modelling methodologies for Q F are either top-down, bottom-up or a combination of the two. Top-down models (e.g. Sailor and Lu 2004;Allen et al. 2011) combine energy consumption or statistical parameters summarising large areas with supplementary data to produce estimates at finer spatial and temporal scales. Bottom-up approaches model emissions from individual buildings (e.g. Bueno et al. 2012) or road links (Smith et al. 2009) and apply these across large areas. A combination of the two (e.g. Iamarino et al. 2012) may be used to balance the desirable accuracy of bottom-up methods with the practicality of the top-down approach, given comprehensive input data are not always available.
Top-down methods are appealing as they (i) allow studies in regions where data are scarce and (ii) are applicable to large geographical areas that can be incorporated into numerical weather prediction or global/regional climate models (e.g. Flanner 2009), and (iii) their relative simplicity makes them computationally tractable. These methods generally use a spatial dataset of local population density (e.g. Lu 2004, Allen et al. 2011) to distribute national/regional per-capita energy consumption and vehicle ownership statistics within the domain of interest. The availability of world-wide gridded population datasets (e.g. GPWv4-Center for International Earth Science Information Network 2016; GHS-Pesaresi 2015) means theoretically Q F can be modelled at sub-kilometre resolution. This makes its use in urban energy balance models appealing, but the potential for inaccuracy rises with finer resolution since the input data may be insufficient to reflect energy consumption across a city at such scales. A Q F model is considered skilful in this analysis if the resulting spatial distributions accurately reproduce a reference model or dataset in different Q F regimes: areas with (i) zero emission, (ii) Bhot spots^(regions with the most intense emission) and (iii) intermediate intensities. Different Q F regimes and/or components are of interest for different applications, so a skill score must be able to objectively compare them by incorporating the abundance and spatial and temporal accuracy of emissions in each regime.
Previous studies comparing Q F model results use subjective assessments of spatial assumptions. For example, Allen et al. (2011) note that using residential population datasets to attribute urban emissions may not accurately represent the true energy use pattern. Quantitative comparisons focus on extreme values or large-area averages rather than spatial accuracy or bias: Sailor et al. (2015) compare maximum values and city-wide averages; Dong et al. (2017) discuss global mean values (and note the limitation of this approach); Lindberg et al. (2013) demonstrate how the origin and resolution of residential population data affect cumulative Q F . While broadly informative, these strategies leave an incomplete understanding of the merits and limitations of different modelling approaches.
In this study, the Fractions Skill Score (FSS) and supplementary analyses are used to evaluate skill objectively and to establish the spatial resolution and time(s) of day at which a simple model produces skilful estimates. A detailed Q F model (GreaterQF; Iamarino et al. 2012) is used as a reference standard against which the skill of a top-down model (LUCY; Allen et al. 2011;Lindberg et al. 2013) is rated.
The LUCY and GreaterQF models have been reimplemented (under the Urban Multiscale Environmental Predictor (UMEP) suite- Lindberg et al. 2018) to interface with the Quantum GIS Geographical Information System (2016) to use standard GIS datasets as inputs. The models are termed BLQF^(LUCY) and BGQF^(GreaterQF) hereafter to emphasise the new implementations presented here, which make it easier to update input data to represent different historical or hypothetical conditions. Information about the FSS and models is provided in Sect. 2, and the skill with which LQF predicts Q F , Q F,T , Q F,M and Q F,B is discussed in Sect. 3, along with further analyses that apply different spatial considerations.

Overview of models
The modifications and improvements in LQF and GQF include pre-processing of input data to attach weightings to each grid cell in the model domain, with grid cell size defined by the spatial resolution of population count data used as input. The weightings are used to apportion vehicle activity, building energy consumption and metabolic activity, from which Q F emissions are estimated (Appendix 2). Weightings in LQF are based only on residential population density, while in GQF the weightings combine population density, energy consumption (broken down by fuel and consumer type), total road length (by road class) and traffic flow (by road link and vehicle type).
LQF contains a user-editable georeferenced database of historic national energy consumption, vehicle ownership (car, motorcycle and freight/bus) and total population statistics. Each grid cell receives a share of the energy consumption and vehicles based on the proportion of national population it contains. Building energy use depends on mean daily air temperature recorded for the study area via an empirical scaling function (Lindberg et al. 2013), which does not conserve the annual per-capita energy consumption derived from the database. Metabolism is simulated using the residential population at all times of day. Hourly profiles of building energy use, metabolic activity and road traffic variations are applied to capture the diurnal variations of these energy sources.
All of the energy input into LQF and GQF is assumed to be released as latent, sensible and/or wastewater Q F . GQF uses city-scale data to estimate emissions: Building emissions are calculated using spatially resolved annual energy consumption (electricity and gas), which distinguish non-domestic and domestic consumption within the city. These are disaggregated to a common spatial resolution using workday and residential population data, respectively. Daily building energy consumption variations are generated based on electricity and gas demand data, and separate non-domestic and domestic half-hourly profiles are applied to capture diurnal variations. Unlike in LQF, annual per-capita building energy consumption is conserved in GQF. Road traffic fuel use for eight vehicle types is built up from a detailed map of road links, and vehicle-specific profiles are applied to elicit diurnal variations. Metabolic emissions are represented using the residential population at night and the workday population on working days, with a transitional period between them. Each grid cell therefore contains different proportions of transport, building and metabolic Q F .

Model configuration
LQF and GQF models were run for Greater London, UK. The study area (residential population 8.2 M, workday population 8.67 M-Office for National Statistics 2016) extends 60 × 45 km at its widest points and contains diverse land uses and substantial population variations. Data requirements are met for both of the models. A substantial change in population distribution occurs between night (Fig. 1c) and working days (Fig. 1e). This leads to large workday enhancements in the city centre and far west and modest but widespread population decreases in the majority of the city (Fig. 1h). The road network contains arterial routes, an inner ring-road in the northern half of the city and an outer highway partly intersecting the edge of the study area. These features are visible in the daily mean vehicle fuel consumption estimate produced by the GQF pre-processing steps (Fig. 1f).
The spatial accuracy of LQF is evaluated on a working day: Tuesday, 5 May 2015. This falls during British Summer Time (UTC + 1, which is explicitly captured in both models). On this date, the mean air temperature recorded in central London was 9.03°C, cool enough to trigger building heating calculations in LQF. Year-long model runs were also performed to assess LQF predictions of daily building energy consumption relative to GQF.
a  b  c Gridded residential population density d OA residential population density e Gridded workday population density f Gridded road fuel consumption g Paved land cover fraction h Workday to residential population ratio GQF and LQF model parameters are summarised in Appendix 1, along with details of the input data sources used. LQF was assigned one transport and one building energy consumption diurnal profile, which are averages of those used in GQF, weighted respectively by the abundance of each vehicle type (outer London; DEFRA 2014) and energy consumer (city-wide values; DBEIS 2016). The profiles and GQF outputs were temporally averaged to match the 60-min time resolution of LQF. Both models were configured to generate Q F at 500 m spatial resolution to approximately match that of the world-wide population datasets such as Gridded Population of the World v4 (GPWv4; Center for International Earth Science Information Network 2016).
Modifications are applied to LQF and GQF to quantify the change in skill from varying spatial input datasets ( Table 1 contains a breakdown of input data used for each variant): & GQF and LQF: standard model configurations & BLQF-Paved^: as LQF, but using remote-sensed paved area fraction data to estimate the spatial distribution of Q F,T & BGQF-Simplified^: spatially resolved energy consumption input to GQF is replaced with city-wide values to document the effect on skill.
Hourly spatial distributions of total Q F were produced by GQF and LQF for the whole of 5 May 2015. Results from the 16:00 to 17:00 UTC time step are shown in Fig. 1a, b as examples. Emissions range from zero to more than 100 W m −2 over the city, with different spatial distributions resulting from each model treatment.

Input data
Per-capita building energy use Per-capita building energy consumption is estimated by LQF using UK total population (62.036 M) and energy consumption statistics (Allen et al. 2011;Lindberg et al. 2013), both from 2010. In comparison, London-specific energy consumption (2014) and population (2011) data are used to calculate per-capita building energy use in GQF. Appendix 1 contains full listings of LQF and GQF input data sources and parameters.
The UK total national primary consumption (PC) in the existing LUCY database was 3.03 times greater per-capita than the city-specific value input into GQF (Fig. 2a) and 2.92 times greater than an alternative estimate of London energy consumption with transport excluded from 2014 (DECC 2014). This disparity is dominated by the use of the total primary energy supply (IEA 2016) in LUCY, which includes energy production minus exports and losses rather than consumption.
The 2010 UK Total Final Energy Consumption minus transport fuel, or NTFC (IEA 2016), differed from the GQF value by only 1.2% and is therefore used in LQF here instead. Changes in consumption between 2010 (LQF input) and 2014 (GQF input) are also likely to contribute to the difference between model inputs, but a full evaluation of top-down energy consumption estimates is beyond the scope of this work.
Transport fuel consumption Published aggregated total fuel consumptions for the UK and London were unsuitable for modelling road transport emissions because they included aviation and rail transport. Instead, GQF-estimated per-capita road transport fuel consumption is based on a map of traffic volume per road link from 2013 (London Datastore 2014) and is 1.4 times greater than the equivalent LQF estimate (Fig. 2b) from UK per-capita vehicle ownership statistics (Worldmapper 2006).
Day-to-day variations Daily mean temperature data for LQF was measured in Central London during the year 2015 by a Davis Vantage Pro 2 Plus weather station atop Barbican Cromwell Tower, a 42-storey residential block (51.521°N, − 0.0930°E, 145 m a.s.l). Daily gas and electricity demand variations for GQF (National Grid 2016a, b) were also sourced for 2015. Section 3.2.1 compares the day-to-day variation in building energy available in each model. LQF road traffic volume is reduced by 20% during weekends and public holidays. Diurnal and day-of-week traffic variations in GQF are governed by a single week-long profile at 30 min resolution.
Population data Spatially resolved Greater London population maps from the 2011 UK census (ONS 2013) are used to allocate population and thus energy consumption across the spatial domains of both models. To represent the effect of grid resolution on skill, gridded population data are produced from the census output area (OA) spatial units that vary in area (Fig. 1d). The populations were redistributed to a regular grid based on the plan area index of buildings according to satellitederived land cover classification data (Marconcini et al. 2017), using the process described in Appendix 2. This yields residential and workday populations downscaled to 500 m resolution (Fig. 1c, e) for use in the two models.

Fractions skill score
The total Q F output by each model 1 at 17:00 UTC (Fig. 1a, b) demonstrates several features that the evaluation must capture: 1. The different frequency and spatial arrangement of hot spots (emissions typically ranging from~50 to over 100 W m −2 at 500 m resolution) towards the centre of London 2. Different spatial features between models, such as enhancements around major roads 3. The overall pattern of Q F , which is more intense in the city centre The Fractions Skill Score (FSS; Roberts and Lean 2008) is a metric that allows forecast skill at different spatial scales to be quantified in light of spatial errors and bias. Originally applied to compare forecast and observed precipitation fields (e.g. Roberts and Lean 2008;Mittermaier and Roberts 2010), it has also been used to evaluate modelled cloud brightness temperatures (e.g. Griffin et al. 2017) and volcanic ash plumes (Harvey and Dacre 2016). The FSS is calculated for different regimes of interest (e.g. Q F intensity-see Table 2-rainfall rate or ash concentration) to highlight different aspects of model behaviour. In this study, regimes are selected based on Q F magnitude and typically occupy certain regions of the city. For example, Q F,B emissions are consistently higher in the city centre; therefore, the upper regimes are found here (Fig. 3a). For a given regime, FSS is estimated as follows: 1. Grid cells belonging to the regime are tagged. 2. A neighbourhood of fixed size surrounding every grid cell in the model domain is chosen. 3. Spatial accuracy is assessed in each neighbourhood (Roberts, 2008). For the j th neighbourhood: (a) The fraction of tagged cells in the neighbourhood is calculated in the candidate model grid (M j ). (b) The corresponding value is calculated in a reference grid (O j ), which may be another model or observations. 4. The fractions are calculated in all N neighbourhoods and combined into the FSS: The FSS is calculated for multiple regimes to evaluate different aspects of the m odel out put. Ordi narily, neighbourhoods of varying size are evaluated to determine spatial accuracy across different scales. Spatial resolution rather than scale is of interest in this work, so a single-pixel neighbourhood is used with model outputs with progressively coarser resolutions. The FSS ranges from 0 to 1, where FSS < 1 indicates poorer skill but not the underlying reasons for it. The frequency bias (the between-models ratio of grid cell counts in a regime) is therefore calculated separately to aid in interpretation. A lower limit, FSS useful , was defined with the FSS to signify whether the candidate model gives an informative (Buseful^) prediction of the reference: where f is the overall fraction of tagged cells in the reference grid, and 0.5 is the likeliest outcome if the forecast is random.
FSS useful is half-way between the two.
The FSS provides an intuitive measure of overlap between models. The impact of spatial consistency is visualised using three examples (Fig. 4b-d) wherein the two models produce similar numbers of cells within a regime (low frequency bias) but with varying degrees of spatial consistency.

Results
LQF and GQF were run for Greater London at 500 m spatial resolution, consistent with finer-scaled global population datasets (e.g. GPWv4). Results are compared on a typical working day: Tuesday, 5 May 2015 for each component of Q F for each hour of output. Model outputs are compared primarily at the 500-m base spatial resolution, with coarser resolutions considered by spatially averaging the outputs in postprocessing. A maximum cell size of 5 km maintains a sample size over 50 grid cells in the 60 × 45 km domain. Skill at different model resolutions from 500 m to 5 km is compared during night-time (00:00-01:00 UTC) and daytime (06:00-07:00 UTC for transport, 11:00-12:00 UTC otherwise) to capture low/domestically dominated and high/nondomestically dominated emissions, respectively. Table 2) are selected based on thresholds [W m −2 ] applied to both model grids. Thresholds are estimated using quantiles of the emission intensity found in the GQF output grid. The resulting FSS therefore indicates if emissions are of the correct magnitude, and whether emissions of a given magnitude occur in the correct locations. Separate thresholds are calculated for each time step, Q F component and spatial resolution.

Selection of regimes
The regimes are chosen such that the No-Q F and G99+ cases, respectively, evaluate spatial intermittency and extremes, while the G50−, G50+ and G90+ regimes evaluate whether the bulk of Q F values are predicted and assigned correctly, indicating if emissions are skewed higher or lower a Buildings b Transport

Day-to-day variations
GQF uses historical demand data to estimate daily building energy consumption, while LQF makes predictions based on ambient temperature (Sect. 2.1.2), and it is informative to establish the magnitude of the differences between these values (although beyond the scope of the work to resolve them). Based on model runs for all of 2015, the LQF/GQF city-wide daily building emissions ratio (Fig. 4e) varies from 0.6 to 1.5. The empirical relationship is bimodal about 1.1 and 0.8, and a seasonal breakdown (Fig. 5a-d) which arises from systematic under-estimates by LQF in the autumn and strong bimodality in winter compared to a central tendency about 1 in spring and summer. The mode at approximately 0.8 occurs because measured energy use generally exceeds that predicted by LQF when the temperature exceeds the LQF balance point (Lindberg et al., 2013), which de-activates artificial heating in the model calculations. These over/under-estimates are applied with equal weighting across all buildings in LQF, and therefore introduce frequency biases that reduce the FSS. The ratio is 0.97 on the study date (a typical mid-range value), so differences in building energy estimation methods are unlikely to confound the results of the spatial analysis.

Diurnal variations
The variation in city-wide mean Q F is consistent between models, with GQF and LQF (Fig. 3a, b) reaching maxima of 12 and 10 W m −2 at 09:00 UTC (respectively) on 5 May 2015. Building energy dominates Q F at all times, with transport emissions proportionally greatest between 06:00 and 07:00 UTC when building energy is still rising. Building emissions are greatest during the working day (non-domestic emissions) and evening (domestic emissions). The proportions shown in Fig. 5a vary spatially because GQF captures separate spatial patterns of transport, domestic and non-domestic emissions.

Total anthropogenic heat flux (Q F )
The time-of-day variation of FSS for total Q F (Fig. 6a) shows consistency between modelled emission peaks at night when  (Fig. 5). The G50− and G50+ regimes (50 and 40% of the area, respectively) are predicted with informative skill all day, but skill is lower during daytime; G90+ (9% of the area) is predicted informatively for much of the day but falls below the threshold of what is considered informative at 06:00, 07:00 and 15:00 UTC. The G99+ regime, which covers 1% of areas with non-zero Q F , is predicted with negligible skill at all times.
Corresponding frequency biases at 01:00 and 07:00 (Table 3, rows 1 and 2) show that LQF predicts no emissions in the G99+ regime, hence the absence of skill. Biases in the G90+ regime are consistent at both times; hence, the reduction in FSS at 07:00 is likely caused by a change in the spatial Q F distribution coinciding with increased traffic activity.
Progressively coarsening the spatial resolution, r, to 5 km (Fig. 6b) at 07:00 raises the skill of the G90+ regime to an informative level at 1 km. There is no such effect on the G99+ skill because the fixed boundaries of the model grid leads Fig. 4 Kernel density estimate of total daily building energy emissions in LQF (E LQF ) relative to GQF (E GQF ) over a spring, b summer, c autumn and d winter months of 2015, and e the whole year (same x-axis scale on each). Values of E LQF /E GQF > 1 indicate over-estimates by LQF with respect to GQF. Vertical dashed line represents the ratio found on 5 May 2015, the focus of this study (n = 365). LQF building emissions vary with air temperature while GQF emissions use empirical demand data

Transport emissions (Q F,T )
The same approach is used to evaluate Q F,T emissions, with No-Q F regime included here to indicate whether the spatially intermittent structure of the road network (Fig. 1f) is predicted adequately by LQF.
None of the Q F,T regimes are predicted informatively at any time of day (Fig. 7a). LQF greatly underestimates the frequency of grid cells in the No-Q F , G90+ and G99+ regimes (Table 3, rows 3 and 4). The frequency of emissions in the G50-regime is overestimated by 45%, though the G50+ regime frequency is within 15% of the correct value. This produces excessive Q F,T away from roads and underestimates emissions near major roads by spreading the available energy too thinly, reflecting how the road network structure differs from the residential population density (Fig. 1c). Coarsening the resolution to 5 km does not improve skill to an informative level except in the No-Q F regime, which improves because emissions occur in all 5 km grid cells.
An improved spatial distribution was sought by normalising the available daily transportation energy to that in GQF and using remote-sensed paved area fraction data in place of population density. This gives rise to the LQF-Paved configuration, which increased the FSS only marginally at 500 m resolution (Fig. 7b) because paved areas (Fig. 1g) resemble an amalgam of population and road network (Fig. 1c, e, f) rather than just the road network. As with the standard LQF configuration, the LQF-Paved frequency biases at 01:00 UTC (Table 3, row 5) show strong under-representation in the No-Q F , G90+ and G99+ regimes, and over-prediction of the G50+ regime is strengthened. There is negligible bias in the G50− regime, so poor FSS here is caused by a lack of spatial consistency.

Building emissions (Q F,B )
Q F,B is predicted informatively by LQF at all times of day in the G50−, G50+ and G90+ regimes (Fig. 8a). Frequency biases at 01:00 and 07:00 UTC are under 3% in the G50− and G50+ regimes, and the G90+ regime is overestimated by 31% (Table 3, rows 6 and 7). As with total Q F , the Q F,B skill reduces during daytime when non-domestic emissions dominate in GQF, but not to the extent that skill falls below an informative level. Coarsening the spatial resolution to 5 km (not shown) does not increase the FSS in the G99+ regime. The structure of the G99+ regime may arise from three aspects of the GQF input data that are not captured by LQF: Fig. 7 As Fig. 6a, but for transport emissions using a LQF and b LQF-Paved, where daily emissions match GQF and emissions are assigned using paved area fraction rather than residential population

1.
Hour-to-hour energy consumption differs strongly between models because GQF uses sector-specific values and diurnal profiles. 2. The sector-specific building energy consumption datasets loaded into GQF are provided in spatially resolved form and generally show greater non-domestic consumption towards the city centre. In contrast, a city-wide consumption value is used in LQF. 3. Non-domestic energy consumption is disaggregated to the required spatial resolution using workday population, which has a different spatial structure to residential population (Fig. 1c, e). LQF disaggregates all energy consumption using residential population.
Hourly total building emissions in the two models were found to differ by less than 7%, suggesting (1) is not the primary cause of the lack of skill.
The GQF-Simplified configuration uses city-wide energy consumption totals instead of spatially resolved input files to assess the effect of (2) on the G99+ regime. The resulting FSS (Fig. 8b) represents the skill of GQF-Simplified at predicting GQF. The FSS across regimes ranges from 0.5 (G99+) to 0.95 (G50−) and remains at informative levels all day, although it should be noted that the G99+ regime is only marginally above the FSS useful threshold in the evening. The effect of GQF-Simplified is to reduce energy use in the city centre and the abundance of grid cells in the G99+ regime by 57-58% (Table 3, rows 8 and 9) at 01:00 and 07:00. This energy redistributes to grid cells in the G90+ regime, with a 23% increase in abundance. This corresponds to the energy from each lost G99+ cell being spread over 3.6 G90+ cells.
LQF was compared to GQF-Simplified to evaluate (3). The G99+ regime of GQF-Simplified is still predicted with negligible skill by LQF (Fig. 8c), albeit with minor improvements from 17:00 to 00:00 UTC, again caused by an absence of cells in the G99+ regime (Table 3, rows 10 and 11). This indicates the lack of skill in upper regimes is a combination of factors (2) and (3).

Metabolic emissions (Q F,M )
GQF estimates night-time Q F,M using residential population and daytime emissions using workday population, with transitional periods during morning and evening and increased daytime metabolic activity per person. LQF uses residential population data at all times of day, and transitions to (from) increased metabolic activity during the morning (evening).
Workday and resident population differences lead to lower skill during the day than at night (Fig. 9), and the FSS rapidly worsens to non-informative levels at 05:00-07:00 and 22:00 UTC in the G50+, G90+ and G99+ regimes because day/night transitions take place differently in the two models. All regimes except the G99+ case are predicted with informative skill outside of transitional periods, and G99+ is predicted skilfully at 19:00-21:00 and 23:00. Non-informative skill in the Q F, M G99+ regime (Table 3, rows 12 to 14) is caused by frequency biases arising from different assumptions regarding resting metabolic rates and work schedules between models. The G99+ regime is over-predicted by 3.3 times at night because LQF assumes each person emits 75 W while GQF assumes 64.3 W. LQF predicts 14.6 times more cells at 07:00 UTC because its transition to active metabolic rates begins earlier than in GQF, which is conditioned on work/ home rather than sleep/wake. At 12:00 UTC, transitions are complete and workday population dominates in GQF and LQF predicts zero G99+ cells, reflecting the high localised density of the workday population. Coarsening the spatial scale to 5 km (not shown) did not improve skill in the G99+ regime.
Night-time disagreement between models is trivial to resolve by adopting a consistent resting metabolic rate, but the differences during transitions between rest, wakefulness and work reflect the different levels of model detail.

Spatial variation of skill
Spatial variation in LQF skill is visualised by calculating the proportion of the day for which each grid cell resides in a regime with informative skill. This reliability (Fig. 10) is labelled as Bconsistently informative^(FSS > FSS useful during all hours), Bintermittently informative^(some hours) or Bpoor^(never).
Total Q F is intermittently informative in approximately 90% of the city area (Fig. 10a). Areas around major roads in the north, east and west of the city are reduced to being intermittently informative where the unskilled transport emissions are the strongest. The city centre contains the poorly predicted G99+ regime, which is dominated by building and transport emissions.
For Q F,B (Fig. 10b), over 98% of the city falls into regimes predicted consistently, and the G99+ regime at the centre is predicted without skill. The difference between Fig. 10a, b highlights the effect of roads on overall skill. Areas near the centre are intermittently informative because some grid cells fall into different regimes over the day. Metabolism (Fig. 10c from LQF is intermittently informative in the central half of the city where the workday population dominates and consistently informative in the residential outer 50%). Q F,T is not included because skill is consistently poor in all cases.

Skill, emissions and energy
The total energy available for building emissions deviated between LQF and GQF on days with a mean temperature over 12°C because LQF assumes no heating occurs in this regime.
The study date represented good agreement between models, and the bias introduced by this error reduces the FSS on other dates. It is stressed, however, that other prediction methods or empirical demand data could be used with the LQF approach instead.
FSS and area coverage are not related to the total energy contained within a regime (Table 2 contains a full breakdown of Q F intensities and energy partitioning). The G50+ regime contains~50% of energy, G90+ around 25% and G99+ approximately 10%. The areal extent of a regime therefore does not reflect its energetic significance, and obtaining good skill in the upper regimes may be more important for urban energy balance considerations if the focus is on high spatial resolution or hot spots.

Spatial accuracy
Total city-wide emissions in each component are consistent between models. LQF reproduced much of the GQF Q F,B spatial variability with an informative level of skill in most areas but was unable to accurately reproduce the city centre hot spots present in GQF output, with energy instead spread elsewhere. Accurate hot spot prediction requires workday population data, spatially constrained energy consumption data and the ability to discriminate between domestic and nondomestic emissions: 1. Spatially resolved domestic and non-domestic building energy consumption constrain emissions within different regions of the city. 2. Workday and residential populations indicate likely finegrained building energy demand patterns during day and night. 3. Separate diurnal profiles for the energy consumption datasets correlate emissions in different areas to particular times of day.
LQF predicted spatial variations of Q F,T poorly in all regimes and times of day, smoothing out emissions over unsuitably large areas at resolutions as coarse as 5 km even when disaggregated using paved area fraction instead of population density. A road network map is required to address this, and the use of crowdsourced vector data such as OpenStreetMap (2017) represents a potential avenue (subject to coverage) if assumptions are made about the division of traffic between major and minor roads. Q F,M is predicted with informative levels of skill at night if accurate assumptions are made about per-person emissions. Differences in assumed resting and working times cause large transient losses in skill in the morning and evening. Daytime Q F,M predictions in LQF were predicted with poorer skill because LQF did not have access to workday population data, although only the hot spot regime fell below an informative level of skill during daytime.
Total Q F skill reflects that of the individual components. LQF is non-informative at all times of day in the city centre because of building-related hot spots and non-informative during some hours of the day near major roads and dense parts of the road network. Since transport contributes less energy than buildings, consistently informative total Q F skill can be obtained by coarsening the spatial resolution from 500 m to 1 km.
GQF is likely to more accurately represent true Q F emissions than LQF; however, in this study, we cannot state how closely it matches reality.

Conclusions
A simple model (LQF) based principally on residential population and national statistics has limited accuracy at spatial resolutions from 0.5 to 5 km when compared with the output of a more detailed model (GQF), which uses city-specific parameters and distinguishes different energy uses: & At the whole-city scale, building and road emissions were within ± 40% of city-specific values, for individual days, with building emissions underestimated on warmer days. & Elevated inner-city emissions, dominated by buildings, were displaced by LQF and hot spots were missed entirely. This is attributed to workday population and nondomestic energy use patterns. & Outer-city emissions were replicated reliably by LQF as they are dominated by domestic buildings. & Transport emissions were predicted poorly throughout the city because population and paved area fraction data attributed emissions across too great an area. & Metabolic emissions were captured skilfully by LQF except during transitions between rest and activity.
We recommend that if detailed modelling is impractical because of limited input data, simple models based on residential population density patterns must be used conservatively: & Resolutions no finer than~1 km should be used to mitigate the effects of a lack of population movement (e.g. from home to work) being modelled. & Transport emissions should be based on road network maps rather than a proxy, especially where major orbital and trunk roads displace traffic volume from dense populations. & The relation between temperature and energy use should be evaluated for each study city. Amongst other improvements to LUCY (now LQF), through this work, optional extra parameters have been added to the LQF software to permit this.
& Errors arising from misplaced emission hot spots should still be expected despite these measures.
As new techniques are developed to obtain Q F (e.g. Chrysoulakis et al. 2016), and high-resolution urban modelling uses Q F as an input (e.g. Loridan et al. 2010;Chen et al. 2011;Bohnenstengel et al. 2014;Best and Grimmond 2016), estimation methods must be evaluated objectively so that their appropriateness can be judged and their limitations addressed. Given the challenge of obtaining all the data necessary to run a detailed model like GQF, enriching a simpler LQF-type model with complementary data may be a more fruitful way of improving the quality of predictions.
Despite extensive inputs, the models discussed here are essentially static and do not explicitly consider the effect of localised or widespread disruptions to human activity. Developing methods to emulate the dynamics of human behaviour therefore is essential so that spatially heterogeneous Q F predictions can be made (Barlow et al. 2017). In turn, this will support urban energy balance and surface-atmosphere interaction modelling at progressively higher spatial and temporal resolutions.

Appendix 1. Model parameters
The following provide the parameters and units used in the models (Tables 4 and 6) and data sources (Tables 5 and 7) used  Allen et al. (2011) and Lindberg et al. (2013). Heating/cooling response curve parameters (not shown) for the UK are as in Lindberg et al. (2013)  All taken from Iamarino et al. (2012) except for vehicle age, which was added in the current work and is assumed to be 5 years. Economy 7 electricity and crude oil parameters are omitted from this study as inputs to GQF and LQF. The data used to populate the LQF database of national statistics is listed in Table 5.
Appendix 2. Algorithm used to produce gridded population and transport energy data (Dis)aggregation of population/energy consumption data The disaggregation algorithm redistributes V S , a set of scalar quantities attributed to a set S of Bsource^spatial units to a set T of target units that intersect them. These units are of arbitrary shape, and S and T may have different spatial extents to one another. Each target unit receives a share of the quantities V S according to the area of source unit(s) intersected and the value of a numerical weighting specified for each target unit. The process and rules to perform the disaggregation are outlined below. Each source unit is processed in turn, and its contributions to each target area are summed afterwards.
1. The n target units spatially intersecting the j th source unit are identified. The i th target unit has an overall area A T, i and intersects an area a T, i . The n target units collectively overlap a fraction F of the overall source area A S, j 2. The i th target unit receives V T, i : This is a fraction of V S, j determined by a weighting w T assigned to each of the n target areas. Weightings may be provided externally, or the overall target unit area is used otherwise (i.e. w T, i = A T, i ): The term W S, j represents the total weighting over the source unit. This is either: (a) Specified externally before processing, which allows W S; j ≠ ∑ n i w T ;i , whereby V S, j is scaled up or down before being disaggregated.
(b) Calculated from the sum of the individual weights, taking into account the fraction F to scale down V S, j if the target units do not completely cover the source unit: (c) If W S, j = 0, it is overridden with W S; j ¼ 1 n .

3.
A target unit may intersect m source units. In this event, the i th target area weighting w i used in Eq. 3 is first scaled by the proportion of the target unit area intersected by the source unit: The process therefore performs aggregation if a target unit bounds multiple source units and disaggregation if the source areas are larger than the target unit. The resulting value of V S, j is conserved across the n target units (i.e. ∑ n i V T ;i ¼ V S; j )

Aggregation of road network data (GQF)
The London Atmospheric Emissions Inventory (LAEI; London London Datastore, 2014) supplies a detailed road segment map, with annual average daily traffic (AADT) specified on a per-segment and per-vehicle class (Table 8) basis. These segments are vector lines and are infinitesimally thin, so the data are transformed to fuel consumption in each output polygon: Total fuel consumption F in an output polygon of area A is obtained by summing contributions from each of the m road segments passing through it. The total fuel consumption f V for vehicle of class v is calculated as in Eq. 6, where l i is road segment length in kilometres, D i, v is the annual average daily traffic (AADT) of the vehicle class on the road segment and ε v is the fuel consumption of the Assumed fuel mix values are taken from Iamarino et al. (2012). Assumed fuel mix is subject to change with regulatory and technological advances vehicle class (kg km −1 ; based on EURO-II derived estimates on urban roads (DEFRA, 1999).
Mean half-hourly Q F,T is estimated (Eq. 7) by summing across all n vehicle types and applying the heat of combustion. The available data splits vehicles into petrol and diesel variants so the appropriate heat of combustion q v from Iamarino (2012) is applied (44.7 and 47.1 MJ kg −1 for petrol, net and gross of water vapour and 43.3 and 45.5 MJ kg −1 for diesel, net and gross of water vapour). Where data regarding the fuel split is not available, assumptions are made (Table 8). Halfhourly variations are incorporated by applying a diurnal scaling factor, which also varies from day to day to reflect reduced weekend traffic flow.
Open Access This article is distributed under the terms of the Creative Comm ons Attribution 4.0 International License (http:// creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.