1 Introduction

Anthropogenic heat flux (QF) can be a substantial input into the urban energy balance, especially in mid- and high-latitude cities during the autumn, winter and early spring when it matches or even exceeds the incoming radiant flux in some areas of a city (e.g. Kikegawa et al. 2003; Hamilton et al. 2009). Despite its magnitude, QF is difficult to measure directly because it is incorporated into measured sensible, latent, storage and long-wave radiative fluxes. QF is therefore commonly estimated using empirical models (e.g. Sailor and Lu 2004; Smith et al. 2009; Allen et al. 2011; Iamarino et al. 2012) that simulate the metabolic (QF,M), transportation (QF,T) and building (QF,B) components of QF.

Modelling methodologies for QF are either top-down, bottom-up or a combination of the two. Top-down models (e.g. Sailor and Lu 2004; Allen et al. 2011) combine energy consumption or statistical parameters summarising large areas with supplementary data to produce estimates at finer spatial and temporal scales. Bottom-up approaches model emissions from individual buildings (e.g. Bueno et al. 2012) or road links (Smith et al. 2009) and apply these across large areas. A combination of the two (e.g. Iamarino et al. 2012) may be used to balance the desirable accuracy of bottom-up methods with the practicality of the top-down approach, given comprehensive input data are not always available.

Top-down methods are appealing as they (i) allow studies in regions where data are scarce and (ii) are applicable to large geographical areas that can be incorporated into numerical weather prediction or global/regional climate models (e.g. Flanner 2009), and (iii) their relative simplicity makes them computationally tractable. These methods generally use a spatial dataset of local population density (e.g. Sailor and Lu 2004, Allen et al. 2011) to distribute national/regional per-capita energy consumption and vehicle ownership statistics within the domain of interest. The availability of world-wide gridded population datasets (e.g. GPWv4—Center for International Earth Science Information Network 2016; GHS—Pesaresi 2015) means theoretically QF can be modelled at sub-kilometre resolution. This makes its use in urban energy balance models appealing, but the potential for inaccuracy rises with finer resolution since the input data may be insufficient to reflect energy consumption across a city at such scales.

A QF model is considered skilful in this analysis if the resulting spatial distributions accurately reproduce a reference model or dataset in different QF regimes: areas with (i) zero emission, (ii) “hot spots” (regions with the most intense emission) and (iii) intermediate intensities. Different QF regimes and/or components are of interest for different applications, so a skill score must be able to objectively compare them by incorporating the abundance and spatial and temporal accuracy of emissions in each regime.

Previous studies comparing QF model results use subjective assessments of spatial assumptions. For example, Allen et al. (2011) note that using residential population datasets to attribute urban emissions may not accurately represent the true energy use pattern. Quantitative comparisons focus on extreme values or large-area averages rather than spatial accuracy or bias: Sailor et al. (2015) compare maximum values and city-wide averages; Dong et al. (2017) discuss global mean values (and note the limitation of this approach); Lindberg et al. (2013) demonstrate how the origin and resolution of residential population data affect cumulative QF. While broadly informative, these strategies leave an incomplete understanding of the merits and limitations of different modelling approaches.

In this study, the Fractions Skill Score (FSS) and supplementary analyses are used to evaluate skill objectively and to establish the spatial resolution and time(s) of day at which a simple model produces skilful estimates. A detailed QF model (GreaterQF; Iamarino et al. 2012) is used as a reference standard against which the skill of a top-down model (LUCY; Allen et al. 2011; Lindberg et al. 2013) is rated.

The LUCY and GreaterQF models have been re-implemented (under the Urban Multiscale Environmental Predictor (UMEP) suite—Lindberg et al. 2018) to interface with the Quantum GIS Geographical Information System (2016) to use standard GIS datasets as inputs. The models are termed “LQF” (LUCY) and “GQF” (GreaterQF) hereafter to emphasise the new implementations presented here, which make it easier to update input data to represent different historical or hypothetical conditions.

Information about the FSS and models is provided in Sect. 2, and the skill with which LQF predicts QF, QF,T, QF,M and QF,B is discussed in Sect. 3, along with further analyses that apply different spatial considerations.

2 Methods

2.1 Overview of models

The modifications and improvements in LQF and GQF include pre-processing of input data to attach weightings to each grid cell in the model domain, with grid cell size defined by the spatial resolution of population count data used as input. The weightings are used to apportion vehicle activity, building energy consumption and metabolic activity, from which QF emissions are estimated (Appendix 2). Weightings in LQF are based only on residential population density, while in GQF the weightings combine population density, energy consumption (broken down by fuel and consumer type), total road length (by road class) and traffic flow (by road link and vehicle type).

LQF contains a user-editable georeferenced database of historic national energy consumption, vehicle ownership (car, motorcycle and freight/bus) and total population statistics. Each grid cell receives a share of the energy consumption and vehicles based on the proportion of national population it contains. Building energy use depends on mean daily air temperature recorded for the study area via an empirical scaling function (Lindberg et al. 2013), which does not conserve the annual per-capita energy consumption derived from the database. Metabolism is simulated using the residential population at all times of day. Hourly profiles of building energy use, metabolic activity and road traffic variations are applied to capture the diurnal variations of these energy sources.

All of the energy input into LQF and GQF is assumed to be released as latent, sensible and/or wastewater QF. GQF uses city-scale data to estimate emissions: Building emissions are calculated using spatially resolved annual energy consumption (electricity and gas), which distinguish non-domestic and domestic consumption within the city. These are disaggregated to a common spatial resolution using workday and residential population data, respectively. Daily building energy consumption variations are generated based on electricity and gas demand data, and separate non-domestic and domestic half-hourly profiles are applied to capture diurnal variations. Unlike in LQF, annual per-capita building energy consumption is conserved in GQF. Road traffic fuel use for eight vehicle types is built up from a detailed map of road links, and vehicle-specific profiles are applied to elicit diurnal variations. Metabolic emissions are represented using the residential population at night and the workday population on working days, with a transitional period between them. Each grid cell therefore contains different proportions of transport, building and metabolic QF.

2.1.1 Model configuration

LQF and GQF models were run for Greater London, UK. The study area (residential population 8.2 M, workday population 8.67 M—Office for National Statistics 2016) extends 60 × 45 km at its widest points and contains diverse land uses and substantial population variations. Data requirements are met for both of the models. A substantial change in population distribution occurs between night (Fig. 1c) and working days (Fig. 1e). This leads to large workday enhancements in the city centre and far west and modest but widespread population decreases in the majority of the city (Fig. 1h). The road network contains arterial routes, an inner ring-road in the northern half of the city and an outer highway partly intersecting the edge of the study area. These features are visible in the daily mean vehicle fuel consumption estimate produced by the GQF pre-processing steps (Fig. 1f).

Fig. 1
figure 1

a, b Total QF from GQF and LQF at 500 m spatial resolution at 17:00 UTC on May 5, 2015 (same colour scale). c, d The residential population density respectively by output area (OA) and after gridding to 500 m resolution. e Gridded workday population. f Gridded total road vehicle fuel consumption estimated in GQF (Appendix 2). g Paved land cover fraction and (h) the ratio of workday (W) to residential (R) population at 500 m resolution, coloured based on which dominates. Panels (c) to (g) are shaded by their respective quintiles. Grey cells contain zero population, emissions or fuel consumption

The spatial accuracy of LQF is evaluated on a working day: Tuesday, 5 May 2015. This falls during British Summer Time (UTC + 1, which is explicitly captured in both models). On this date, the mean air temperature recorded in central London was 9.03 °C, cool enough to trigger building heating calculations in LQF. Year-long model runs were also performed to assess LQF predictions of daily building energy consumption relative to GQF.

GQF and LQF model parameters are summarised in Appendix 1, along with details of the input data sources used. LQF was assigned one transport and one building energy consumption diurnal profile, which are averages of those used in GQF, weighted respectively by the abundance of each vehicle type (outer London; DEFRA 2014) and energy consumer (city-wide values; DBEIS 2016). The profiles and GQF outputs were temporally averaged to match the 60-min time resolution of LQF. Both models were configured to generate QF at 500 m spatial resolution to approximately match that of the world-wide population datasets such as Gridded Population of the World v4 (GPWv4; Center for International Earth Science Information Network 2016).

Modifications are applied to LQF and GQF to quantify the change in skill from varying spatial input datasets (Table 1 contains a breakdown of input data used for each variant):

  • GQF and LQF: standard model configurations

  • “LQF-Paved”: as LQF, but using remote-sensed paved area fraction data to estimate the spatial distribution of QF,T

  • “GQF-Simplified”: spatially resolved energy consumption input to GQF is replaced with city-wide values to document the effect on skill.

Table 1 Summary of input data attributes required by the different GQF and LQF configurations, with decreasingly detailed configurations further to the right. More entries in a given column imply a more detailed configuration. See Appendix 1 for details regarding the datasets used

Hourly spatial distributions of total QF were produced by GQF and LQF for the whole of 5 May 2015. Results from the 16:00 to 17:00 UTC time step are shown in Fig. 1a, b as examples. Emissions range from zero to more than 100 W m−2 over the city, with different spatial distributions resulting from each model treatment.

2.1.2 Input data

Per-capita building energy use

Per-capita building energy consumption is estimated by LQF using UK total population (62.036 M) and energy consumption statistics (Allen et al. 2011; Lindberg et al. 2013), both from 2010. In comparison, London-specific energy consumption (2014) and population (2011) data are used to calculate per-capita building energy use in GQF. Appendix 1 contains full listings of LQF and GQF input data sources and parameters.

The UK total national primary consumption (PC) in the existing LUCY database was 3.03 times greater per-capita than the city-specific value input into GQF (Fig. 2a) and 2.92 times greater than an alternative estimate of London energy consumption with transport excluded from 2014 (DECC 2014). This disparity is dominated by the use of the total primary energy supply (IEA 2016) in LUCY, which includes energy production minus exports and losses rather than consumption.

Fig. 2
figure 2

Per-capita annual energy emissions for a buildings and b transport estimated by GQF, LQF, and LUCY. LUCY and LQF respectively use national primary consumption (PC) and non-transport final consumption (NTFC) (IEA, 2016) for building energy consumption and both estimate transport emissions based on vehicle ownership. London-specific energy consumption values (DECC, 2014) are shown for comparison

The 2010 UK Total Final Energy Consumption minus transport fuel, or NTFC (IEA 2016), differed from the GQF value by only 1.2% and is therefore used in LQF here instead. Changes in consumption between 2010 (LQF input) and 2014 (GQF input) are also likely to contribute to the difference between model inputs, but a full evaluation of top-down energy consumption estimates is beyond the scope of this work.

Transport fuel consumption

Published aggregated total fuel consumptions for the UK and London were unsuitable for modelling road transport emissions because they included aviation and rail transport. Instead, GQF-estimated per-capita road transport fuel consumption is based on a map of traffic volume per road link from 2013 (London Datastore 2014) and is 1.4 times greater than the equivalent LQF estimate (Fig. 2b) from UK per-capita vehicle ownership statistics (Worldmapper 2006).

Day-to-day variations

Daily mean temperature data for LQF was measured in Central London during the year 2015 by a Davis Vantage Pro 2 Plus weather station atop Barbican Cromwell Tower, a 42-storey residential block (51.521°N, − 0.0930°E, 145 m a.s.l). Daily gas and electricity demand variations for GQF (National Grid 2016a, b) were also sourced for 2015. Section 3.2.1 compares the day-to-day variation in building energy available in each model.

LQF road traffic volume is reduced by 20% during weekends and public holidays. Diurnal and day-of-week traffic variations in GQF are governed by a single week-long profile at 30 min resolution.

Population data

Spatially resolved Greater London population maps from the 2011 UK census (ONS 2013) are used to allocate population and thus energy consumption across the spatial domains of both models. To represent the effect of grid resolution on skill, gridded population data are produced from the census output area (OA) spatial units that vary in area (Fig. 1d). The populations were redistributed to a regular grid based on the plan area index of buildings according to satellite-derived land cover classification data (Marconcini et al. 2017), using the process described in Appendix 2. This yields residential and workday populations downscaled to 500 m resolution (Fig. 1c, e) for use in the two models.

2.2 Fractions skill score

The total QF output by each modelFootnote 1 at 17:00 UTC (Fig. 1a, b) demonstrates several features that the evaluation must capture:

  1. 1.

    The different frequency and spatial arrangement of hot spots (emissions typically ranging from ~ 50 to over 100 W m−2 at 500 m resolution) towards the centre of London

  2. 2.

    Different spatial features between models, such as enhancements around major roads

  3. 3.

    The overall pattern of QF, which is more intense in the city centre

The Fractions Skill Score (FSS; Roberts and Lean 2008) is a metric that allows forecast skill at different spatial scales to be quantified in light of spatial errors and bias. Originally applied to compare forecast and observed precipitation fields (e.g. Roberts and Lean 2008; Mittermaier and Roberts 2010), it has also been used to evaluate modelled cloud brightness temperatures (e.g. Griffin et al. 2017) and volcanic ash plumes (Harvey and Dacre 2016). The FSS is calculated for different regimes of interest (e.g. QF intensity—see Table 2—rainfall rate or ash concentration) to highlight different aspects of model behaviour. In this study, regimes are selected based on QF magnitude and typically occupy certain regions of the city. For example, QF,B emissions are consistently higher in the city centre; therefore, the upper regimes are found here (Fig. 3a). For a given regime, FSS is estimated as follows:

  1. 1.

    Grid cells belonging to the regime are tagged.

  2. 2.

    A neighbourhood of fixed size surrounding every grid cell in the model domain is chosen.

  3. 3.

    Spatial accuracy is assessed in each neighbourhood (Roberts, 2008). For the jth neighbourhood:

    1. (a)

      The fraction of tagged cells in the neighbourhood is calculated in the candidate model grid (Mj).

    2. (b)

      The corresponding value is calculated in a reference grid (Oj), which may be another model or observations.

  4. 4.

    The fractions are calculated in all N neighbourhoods and combined into the FSS:

Table 2 Definitions of anthropogenic heat flux (QF) intensity regimes used in model comparison, with corresponding QF values to illustrate the magnitude of each threshold in terms of GQF total QF, and the proportion of the total GQF hourly energy emitted from each regime. Ranges are used because thresholds are re-evaluated at each time step
Fig. 3
figure 3

Examples of a where each emissions regime (Table 2) falls during a GQF time step (building emissions 5 May 2015 at 19:00 UTC), and maps that yield b high (informative), c medium (marginally above FSSuseful) and d low (non-informative) FSS values for the G90+ regime. Purple cells indicate where LQF and GQF emissions coincide, and pink and blue cells show mismatches between LQF and GQF (respectively) emissions. Note that different times and QF components are used to demonstrate each mismatch

$$ FSS=1-\frac{\sum_{j=1}^N{\left({O}_j-{M}_j\right)}^2}{\sum_{j=1}^N{O}_j^2+\sum \limits_{j=1}^N{M}_j^2} $$

The FSS is calculated for multiple regimes to evaluate different aspects of the model output. Ordinarily, neighbourhoods of varying size are evaluated to determine spatial accuracy across different scales. Spatial resolution rather than scale is of interest in this work, so a single-pixel neighbourhood is used with model outputs with progressively coarser resolutions.

The FSS ranges from 0 to 1, where FSS < 1 indicates poorer skill but not the underlying reasons for it. The frequency bias (the between-models ratio of grid cell counts in a regime) is therefore calculated separately to aid in interpretation. A lower limit, FSSuseful, was defined with the FSS to signify whether the candidate model gives an informative (“useful”) prediction of the reference:

$$ {FSS}_{useful}=0.5+\raisebox{1ex}{$f$}\!\left/ \!\raisebox{-1ex}{$2$}\right. $$

where f is the overall fraction of tagged cells in the reference grid, and 0.5 is the likeliest outcome if the forecast is random. FSSuseful is half-way between the two.

The FSS provides an intuitive measure of overlap between models. The impact of spatial consistency is visualised using three examples (Fig. 4b–d) wherein the two models produce similar numbers of cells within a regime (low frequency bias) but with varying degrees of spatial consistency.

Fig. 4
figure 4

Kernel density estimate of total daily building energy emissions in LQF (ELQF) relative to GQF (EGQF) over a spring, b summer, c autumn and d winter months of 2015, and e the whole year (same x-axis scale on each). Values of ELQF/EGQF > 1 indicate over-estimates by LQF with respect to GQF. Vertical dashed line represents the ratio found on 5 May 2015, the focus of this study (n = 365). LQF building emissions vary with air temperature while GQF emissions use empirical demand data

3 Results

LQF and GQF were run for Greater London at 500 m spatial resolution, consistent with finer-scaled global population datasets (e.g. GPWv4). Results are compared on a typical working day: Tuesday, 5 May 2015 for each component of QF for each hour of output. Model outputs are compared primarily at the 500-m base spatial resolution, with coarser resolutions considered by spatially averaging the outputs in post-processing. A maximum cell size of 5 km maintains a sample size over 50 grid cells in the 60 × 45 km domain.

Skill at different model resolutions from 500 m to 5 km is compared during night-time (00:00–01:00 UTC) and daytime (06:00–07:00 UTC for transport, 11:00–12:00 UTC otherwise) to capture low/domestically dominated and high/non-domestically dominated emissions, respectively.

3.1 Selection of regimes

QF intensity regimes (defined in Table 2) are selected based on thresholds [W m−2] applied to both model grids. Thresholds are estimated using quantiles of the emission intensity found in the GQF output grid. The resulting FSS therefore indicates if emissions are of the correct magnitude, and whether emissions of a given magnitude occur in the correct locations. Separate thresholds are calculated for each time step, QF component and spatial resolution.

The regimes are chosen such that the No-QF and G99+ cases, respectively, evaluate spatial intermittency and extremes, while the G50−, G50+ and G90+ regimes evaluate whether the bulk of QF values are predicted and assigned correctly, indicating if emissions are skewed higher or lower than in GQF. The No-QF regime is evaluated only for QF,T because the two models use different datasets to estimate spatial intermittency for transport, whereas population data (with zero values in identical locations) is used for other components. Skill in a given QF regime is considered to be informative if FSS > FSSuseful.

3.2 City-wide results

3.2.1 Day-to-day variations

GQF uses historical demand data to estimate daily building energy consumption, while LQF makes predictions based on ambient temperature (Sect. 2.1.2), and it is informative to establish the magnitude of the differences between these values (although beyond the scope of the work to resolve them). Based on model runs for all of 2015, the LQF/GQF city-wide daily building emissions ratio (Fig. 4e) varies from 0.6 to 1.5. The empirical relationship is bimodal about 1.1 and 0.8, and a seasonal breakdown (Fig. 5a–d) which arises from systematic under-estimates by LQF in the autumn and strong bimodality in winter compared to a central tendency about 1 in spring and summer. The mode at approximately 0.8 occurs because measured energy use generally exceeds that predicted by LQF when the temperature exceeds the LQF balance point (Lindberg et al., 2013), which de-activates artificial heating in the model calculations.

Fig. 5
figure 5

Time series showing the time evolution of (upper) mean total QF for all of London on 5 May 2015 and (lower) the proportion contributed by each QF component in a GQF and b LQF runs. GQF further separates building emissions into domestic (Dm) and non-domestic (NonDm) building emissions

These over/under-estimates are applied with equal weighting across all buildings in LQF, and therefore introduce frequency biases that reduce the FSS. The ratio is 0.97 on the study date (a typical mid-range value), so differences in building energy estimation methods are unlikely to confound the results of the spatial analysis.

3.2.2 Diurnal variations

The variation in city-wide mean QF is consistent between models, with GQF and LQF (Fig. 3a, b) reaching maxima of 12 and 10 W m−2 at 09:00 UTC (respectively) on 5 May 2015. Building energy dominates QF at all times, with transport emissions proportionally greatest between 06:00 and 07:00 UTC when building energy is still rising. Building emissions are greatest during the working day (non-domestic emissions) and evening (domestic emissions). The proportions shown in Fig. 5a vary spatially because GQF captures separate spatial patterns of transport, domestic and non-domestic emissions.

3.3 Total anthropogenic heat flux (QF)

The time-of-day variation of FSS for total QF (Fig. 6a) shows consistency between modelled emission peaks at night when residential building emissions dominate QF. Skill falls during the day when transport and non-domestic emissions are greatest (Fig. 5). The G50− and G50+ regimes (50 and 40% of the area, respectively) are predicted with informative skill all day, but skill is lower during daytime; G90+ (9% of the area) is predicted informatively for much of the day but falls below the threshold of what is considered informative at 06:00, 07:00 and 15:00 UTC. The G99+ regime, which covers 1% of areas with non-zero QF, is predicted with negligible skill at all times.

Fig. 6
figure 6

Fraction skills scores (Eq. 1) for total QF in Greater London (500 m resolution) based on LQF and GQF-Standard by a time of day on 5 May 2015, and b spatial resolution at 07:00 UTC. One line shown per QF intensity regime (Sect. 3.1). Darker lines indicate higher QF; filled points denote informative FSS levels according to Eq. 2, and hollow points non-informative

Corresponding frequency biases at 01:00 and 07:00 (Table 3, rows 1 and 2) show that LQF predicts no emissions in the G99+ regime, hence the absence of skill. Biases in the G90+ regime are consistent at both times; hence, the reduction in FSS at 07:00 is likely caused by a change in the spatial QF distribution coinciding with increased traffic activity.

Table 3 Frequency biases (Candidate/Reference) in each emissions regime, by QF component (Sect. 1) and model variant (Sect. 2.1.1). Values greater than 1 indicate over-prediction by the candidate, and values under 1 indicate under-prediction. Non-informative entries are omitted. Table 2 defines the emissions regimes. Note that daylight savings time was in force on the date of the study

Progressively coarsening the spatial resolution, r, to 5 km (Fig. 6b) at 07:00 raises the skill of the G90+ regime to an informative level at 1 km. There is no such effect on the G99+ skill because the fixed boundaries of the model grid leads higher emissions to consolidate to different regions of the domain as r reaches 5 km.

3.4 Transport emissions (QF,T)

The same approach is used to evaluate QF,T emissions, with No-QF regime included here to indicate whether the spatially intermittent structure of the road network (Fig. 1f) is predicted adequately by LQF.

None of the QF,T regimes are predicted informatively at any time of day (Fig. 7a). LQF greatly underestimates the frequency of grid cells in the No-QF, G90+ and G99+ regimes (Table 3, rows 3 and 4). The frequency of emissions in the G50-regime is overestimated by 45%, though the G50+ regime frequency is within 15% of the correct value. This produces excessive QF,T away from roads and underestimates emissions near major roads by spreading the available energy too thinly, reflecting how the road network structure differs from the residential population density (Fig. 1c). Coarsening the resolution to 5 km does not improve skill to an informative level except in the No-QF regime, which improves because emissions occur in all 5 km grid cells.

Fig. 7
figure 7

As Fig. 6a, but for transport emissions using a LQF and b LQF-Paved, where daily emissions match GQF and emissions are assigned using paved area fraction rather than residential population

An improved spatial distribution was sought by normalising the available daily transportation energy to that in GQF and using remote-sensed paved area fraction data in place of population density. This gives rise to the LQF-Paved configuration, which increased the FSS only marginally at 500 m resolution (Fig. 7b) because paved areas (Fig. 1g) resemble an amalgam of population and road network (Fig. 1c, e, f) rather than just the road network. As with the standard LQF configuration, the LQF-Paved frequency biases at 01:00 UTC (Table 3, row 5) show strong under-representation in the No-QF, G90+ and G99+ regimes, and over-prediction of the G50+ regime is strengthened. There is negligible bias in the G50− regime, so poor FSS here is caused by a lack of spatial consistency.

3.5 Building emissions (QF,B)

QF,B is predicted informatively by LQF at all times of day in the G50−, G50+ and G90+ regimes (Fig. 8a). Frequency biases at 01:00 and 07:00 UTC are under 3% in the G50− and G50+ regimes, and the G90+ regime is overestimated by 31% (Table 3, rows 6 and 7). As with total QF, the QF,B skill reduces during daytime when non-domestic emissions dominate in GQF, but not to the extent that skill falls below an informative level. Coarsening the spatial resolution to 5 km (not shown) does not increase the FSS in the G99+ regime.

Fig. 8
figure 8

As Fig. 6a, but showing time series of FSS for building emissions skill (500 m spatial resolution) for a LQF and b GQF-Simplified against GQF-Standard and c LQF against GQF-Simplified

The structure of the G99+ regime may arise from three aspects of the GQF input data that are not captured by LQF:

  1. 1.

    Hour-to-hour energy consumption differs strongly between models because GQF uses sector-specific values and diurnal profiles.

  2. 2.

    The sector-specific building energy consumption datasets loaded into GQF are provided in spatially resolved form and generally show greater non-domestic consumption towards the city centre. In contrast, a city-wide consumption value is used in LQF.

  3. 3.

    Non-domestic energy consumption is disaggregated to the required spatial resolution using workday population, which has a different spatial structure to residential population (Fig. 1c, e). LQF disaggregates all energy consumption using residential population.

Hourly total building emissions in the two models were found to differ by less than 7%, suggesting (1) is not the primary cause of the lack of skill.

The GQF-Simplified configuration uses city-wide energy consumption totals instead of spatially resolved input files to assess the effect of (2) on the G99+ regime. The resulting FSS (Fig. 8b) represents the skill of GQF-Simplified at predicting GQF. The FSS across regimes ranges from 0.5 (G99+) to 0.95 (G50−) and remains at informative levels all day, although it should be noted that the G99+ regime is only marginally above the FSSuseful threshold in the evening. The effect of GQF-Simplified is to reduce energy use in the city centre and the abundance of grid cells in the G99+ regime by 57–58% (Table 3, rows 8 and 9) at 01:00 and 07:00. This energy redistributes to grid cells in the G90+ regime, with a 23% increase in abundance. This corresponds to the energy from each lost G99+ cell being spread over 3.6 G90+ cells.

LQF was compared to GQF-Simplified to evaluate (3). The G99+ regime of GQF-Simplified is still predicted with negligible skill by LQF (Fig. 8c), albeit with minor improvements from 17:00 to 00:00 UTC, again caused by an absence of cells in the G99+ regime (Table 3, rows 10 and 11). This indicates the lack of skill in upper regimes is a combination of factors (2) and (3).

3.6 Metabolic emissions (QF,M)

GQF estimates night-time QF,M using residential population and daytime emissions using workday population, with transitional periods during morning and evening and increased daytime metabolic activity per person. LQF uses residential population data at all times of day, and transitions to (from) increased metabolic activity during the morning (evening).

Workday and resident population differences lead to lower skill during the day than at night (Fig. 9), and the FSS rapidly worsens to non-informative levels at 05:00–07:00 and 22:00 UTC in the G50+, G90+ and G99+ regimes because day/night transitions take place differently in the two models. All regimes except the G99+ case are predicted with informative skill outside of transitional periods, and G99+ is predicted skilfully at 19:00–21:00 and 23:00.

Fig. 9
figure 9

As Fig. 6a, but showing FSS for metabolic emissions skill from LQF compared with GQF-Standard

Non-informative skill in the QF,M G99+ regime (Table 3, rows 12 to 14) is caused by frequency biases arising from different assumptions regarding resting metabolic rates and work schedules between models. The G99+ regime is over-predicted by 3.3 times at night because LQF assumes each person emits 75 W while GQF assumes 64.3 W. LQF predicts 14.6 times more cells at 07:00 UTC because its transition to active metabolic rates begins earlier than in GQF, which is conditioned on work/home rather than sleep/wake. At 12:00 UTC, transitions are complete and workday population dominates in GQF and LQF predicts zero G99+ cells, reflecting the high localised density of the workday population. Coarsening the spatial scale to 5 km (not shown) did not improve skill in the G99+ regime.

Night-time disagreement between models is trivial to resolve by adopting a consistent resting metabolic rate, but the differences during transitions between rest, wakefulness and work reflect the different levels of model detail.

3.7 Spatial variation of skill

Spatial variation in LQF skill is visualised by calculating the proportion of the day for which each grid cell resides in a regime with informative skill. This reliability (Fig. 10) is labelled as “consistently informative” (FSS > FSSuseful during all hours), “intermittently informative” (some hours) or “poor” (never).

Fig. 10
figure 10

Grid cell skill rating, indicating whether the regime(s) within which each pixel resided during 5 May 2015 were predicted with useful skill at all hours, some hours or never. Results shown for a Total QF, b QF,B and c QF,M. Transport emissions are omitted because no useful skill was found (Fig. 7)

Total QF is intermittently informative in approximately 90% of the city area (Fig. 10a). Areas around major roads in the north, east and west of the city are reduced to being intermittently informative where the unskilled transport emissions are the strongest. The city centre contains the poorly predicted G99+ regime, which is dominated by building and transport emissions.

For QF,B (Fig. 10b), over 98% of the city falls into regimes predicted consistently, and the G99+ regime at the centre is predicted without skill. The difference between Fig. 10a, b highlights the effect of roads on overall skill. Areas near the centre are intermittently informative because some grid cells fall into different regimes over the day. Metabolism (Fig. 10c from LQF is intermittently informative in the central half of the city where the workday population dominates and consistently informative in the residential outer 50%). QF,T is not included because skill is consistently poor in all cases.

4 Discussion

4.1 Skill, emissions and energy

The total energy available for building emissions deviated between LQF and GQF on days with a mean temperature over 12 °C because LQF assumes no heating occurs in this regime. The study date represented good agreement between models, and the bias introduced by this error reduces the FSS on other dates. It is stressed, however, that other prediction methods or empirical demand data could be used with the LQF approach instead.

FSS and area coverage are not related to the total energy contained within a regime (Table 2 contains a full breakdown of QF intensities and energy partitioning). The G50+ regime contains ~ 50% of energy, G90+ around 25% and G99+ approximately 10%. The areal extent of a regime therefore does not reflect its energetic significance, and obtaining good skill in the upper regimes may be more important for urban energy balance considerations if the focus is on high spatial resolution or hot spots.

4.2 Spatial accuracy

Total city-wide emissions in each component are consistent between models. LQF reproduced much of the GQF QF,B spatial variability with an informative level of skill in most areas but was unable to accurately reproduce the city centre hot spots present in GQF output, with energy instead spread elsewhere. Accurate hot spot prediction requires workday population data, spatially constrained energy consumption data and the ability to discriminate between domestic and non-domestic emissions:

  1. 1.

    Spatially resolved domestic and non-domestic building energy consumption constrain emissions within different regions of the city.

  2. 2.

    Workday and residential populations indicate likely fine-grained building energy demand patterns during day and night.

  3. 3.

    Separate diurnal profiles for the energy consumption datasets correlate emissions in different areas to particular times of day.

LQF predicted spatial variations of QF,T poorly in all regimes and times of day, smoothing out emissions over unsuitably large areas at resolutions as coarse as 5 km even when disaggregated using paved area fraction instead of population density. A road network map is required to address this, and the use of crowd-sourced vector data such as OpenStreetMap (2017) represents a potential avenue (subject to coverage) if assumptions are made about the division of traffic between major and minor roads.

QF,M is predicted with informative levels of skill at night if accurate assumptions are made about per-person emissions. Differences in assumed resting and working times cause large transient losses in skill in the morning and evening. Daytime QF,M predictions in LQF were predicted with poorer skill because LQF did not have access to workday population data, although only the hot spot regime fell below an informative level of skill during daytime.

Total QF skill reflects that of the individual components. LQF is non-informative at all times of day in the city centre because of building-related hot spots and non-informative during some hours of the day near major roads and dense parts of the road network. Since transport contributes less energy than buildings, consistently informative total QF skill can be obtained by coarsening the spatial resolution from 500 m to 1 km.

GQF is likely to more accurately represent true QF emissions than LQF; however, in this study, we cannot state how closely it matches reality.

5 Conclusions

A simple model (LQF) based principally on residential population and national statistics has limited accuracy at spatial resolutions from 0.5 to 5 km when compared with the output of a more detailed model (GQF), which uses city-specific parameters and distinguishes different energy uses:

  • At the whole-city scale, building and road emissions were within ± 40% of city-specific values, for individual days, with building emissions underestimated on warmer days.

  • Elevated inner-city emissions, dominated by buildings, were displaced by LQF and hot spots were missed entirely. This is attributed to workday population and non-domestic energy use patterns.

  • Outer-city emissions were replicated reliably by LQF as they are dominated by domestic buildings.

  • Transport emissions were predicted poorly throughout the city because population and paved area fraction data attributed emissions across too great an area.

  • Metabolic emissions were captured skilfully by LQF except during transitions between rest and activity.

We recommend that if detailed modelling is impractical because of limited input data, simple models based on residential population density patterns must be used conservatively:

  • Resolutions no finer than ~ 1 km should be used to mitigate the effects of a lack of population movement (e.g. from home to work) being modelled.

  • Transport emissions should be based on road network maps rather than a proxy, especially where major orbital and trunk roads displace traffic volume from dense populations.

  • The relation between temperature and energy use should be evaluated for each study city. Amongst other improvements to LUCY (now LQF), through this work, optional extra parameters have been added to the LQF software to permit this.

  • Errors arising from misplaced emission hot spots should still be expected despite these measures.

As new techniques are developed to obtain QF (e.g. Chrysoulakis et al. 2016), and high-resolution urban modelling uses QF as an input (e.g. Loridan et al. 2010; Chen et al. 2011; Bohnenstengel et al. 2014; Best and Grimmond 2016), estimation methods must be evaluated objectively so that their appropriateness can be judged and their limitations addressed. Given the challenge of obtaining all the data necessary to run a detailed model like GQF, enriching a simpler LQF-type model with complementary data may be a more fruitful way of improving the quality of predictions.

Despite extensive inputs, the models discussed here are essentially static and do not explicitly consider the effect of localised or widespread disruptions to human activity. Developing methods to emulate the dynamics of human behaviour therefore is essential so that spatially heterogeneous QF predictions can be made (Barlow et al. 2017). In turn, this will support urban energy balance and surface-atmosphere interaction modelling at progressively higher spatial and temporal resolutions.