The thermal urban environment has been widely studied in the context of the urban heat island (UHI) (Oke 1976) effect, occurring when air temperature is consistently higher in urban areas than in their rural surroundings, due to its implications on human health (Heaviside et al. 2016, 2017), ecology (Yow 2007), and energy use (Santamouris et al. 2015). Air temperature can be approximated by land surface temperature (LST) (Sheng et al. 2015), measurements of which are readily available through remotely sensed satellite imagery, with images at 30 to 120 m resolution acquired from Landsat or 90 m from ASTER satellites having been most frequently used in studies relating urban LST to 2D or 3D spatial configuration and composition of urban landscapes, e.g. Connors et al. (2013); Chen et al. (2014); Zhou et al. (2020); Sun et al. (2020a, b). Whilst these studies, due to the coarse spatial resolution of the LST imagery, focus on explanation of the LST within relatively large and variedly defined subdivisions of towns, they may lack in sufficient detail regarding spatial configuration of individual land cover patches contributing to the thermal comfort outdoors (Perini et al. 2017; Li et al. 2020c) or within building interiors (Futcher et al. 2013; Garshasbi et al. 2020), relevant to microscales rather than neighbourhoods or city districts. The use of fine resolution land cover data in studies relating spatial configuration of urban form and LST has been recommended (Li et al. 2013; Liu et al. 2016) due to their ability to accurately represent fragmented urban landscapes leading to an increased robustness of the analyses, however, the use of equally fine resolution LST imagery has never been investigated in this context.

Therefore, the overarching objective of this work is to address the spatial scale limitations of previous studies by exploring the relationship between spatial configuration descriptors (SCDs) of urban form and LST at a rare fine spatial resolution of 2 m, both for LST and land cover data, enabling focus on individual land cover patches rather than larger fragments of towns, and providing a link between coarse- and micro-scale studies, the latter only possible for small areas at a time (Ramyar et al. 2019). Following Guyot et al. (2021), we defined urban form as the spatial organization of urban physical elements such as buildings, streets and parcels and expand this definition to include greenspaces and water. We placed a particular interest in the determination of spatial configuration conditions associated with the formation of the coldest and hottest land cover patches, defined through clustering of LST values across the study area. We hypothesise that (1) spatial configuration properties of both the core land cover patches as well as the properties of land cover located in their neighbourhoods are the determining factors of temperature of the core patches, (2) urban form patterns conducive to the formation of the coldest and hottest land cover patches change as the summer progresses, and (3) a set of spatial configuration rules exists that can guide planning and urban design for better thermal regulation of cities in the summer. Additionally, we determine the optimal neighbourhood size and spatial resolution of LST imagery for patch-orientated studies. The assessment is carried out over a very large representative sample of urban form patterns collected throughout three British suburban towns and their thermal properties are approximated by land surface temperature (LST) captured on two summer days a month apart. Focus on explanation of LST of individual land cover patches rather than areas of a city, very high resolution of LST data, and inclusion of several rarely used SCDs constitute the novelty of our approach. Furthermore, we adopted a flipped methodological approach that moved away from quantification of cooling distances of urban green–blue spaces, such as for example in a recent and insightful study by Sun et al. (2020b), but rather investigated the size and composition of target patches’ neighbourhoods with an aspiration to provide urban design recommendations tailored to their spatial characteristics.

Materials and methods

Study area

The study area comprises three towns located in a relatively close proximity in England: Milton Keynes (52°0′N, 0°47′W, appr. 122 km2), Bedford (52°8′N, 0°27′W, appr. 60 km2), and Luton/Dunstable (51°52′N, 0°25′W, appr. 86 km2) (Fig. 1) with population of 229,941, 106,940, and 258,018 (Office for National Statistics (2013) respectively and a temperate oceanic climate according to the Köppen–Geiger climate classification system. The three towns are characterised with contrasting histories: modern-day garden-city, medieval, and industrial, respectively, collectively representing a wide range of urban form patterns, described in more detail in (Grafius et al. 2016; Zawadzka et al. 2020a).

Fig. 1
figure 1

Land cover in A–Milton Keynes, B–Bedford, C–Luton/Dunstable. The insert depicts location of the towns within Great Britain. Analyses were carried out for areas within the ‘Built-up Area Extent’ boundary. Adapted from Zawadzka et al. (2020b)


The data used in this study comprise several datasets derived and described in previous work: very high spatial resolution (2 m) LST maps downscaled from Landsat 8 TIR imagery for 6th June and 8th July 2013 using statistical methods (Zawadzka et al. 2020a), a land cover map showing the distribution of five main land cover types at 2 m spatial resolution (Grafius et al. 2016), Fig. 1, and a map of land cover subtypes obtained from a two-tiered K-means clustering analysis of selected landscape metrics and LST to yield land cover patches classified according to their spatial configuration and temperature (Zawadzka et al. 2020b). Very high resolution LST data were downscaled from Landsat 8-derived thermal images using multivariate adaptive regression splines relating LST to spectral indices and distribution of impervious surfaces and water with overall accuracy measured by root mean square error (RMSE) ranging from 1.4 to 1.83 K depending on date and town. The two dates (6th of June and 8th of July 2013) were chosen as the only summer-time cloudless Landsat 8 images available for the three towns at the time of conducting the downscaling study. Additionally, this study makes use of elevation represented by a digital surface model, and feature height data at 2 m resolution derived from NERC-ARSF Leica ALS50-II LiDAR survey carried out over the three towns (Grafius et al. 2016), as well as Landsat 8-derived LST image at its original spatial resolution of 100 m, being in fact placed somewhere between 30 and 100 m due to the resampling procedure applied by USGS—Landsat data provider in order to match the spatial resolution of the thermal bands to the 30 m spatial resolution of the visible and infra-red bands of the Landsat 8 images.


Relationships between urban form and LST of LC patches

The methodology for elucidation of the relationships between the coldest and the hottest land cover subtypes and spatial configuration of urban form can be summarised in three major steps: (1) generation of land cover subtype patches that are characterised with the highest and lowest LST in June and July, (2) identification of the most important SCDs influencing the formation of the hottest and coldest patches of a given land cover sub-type and the distance at which patches located in the neighbourhood can influence the LST of the core patches, and (3) analysis of the most important SCDs of urban form associated with coldest and hottest land cover subtype patches over the course of a warming summer (Fig. 2).

Fig. 2
figure 2

Overview of the methodology applied to elucidate the relationship between spatial configuration of urban form and the formation of the coldest and hottest LC sub-type patches. LC land cover, cLHS conditioned latin hypercube sampling, SCDs spatial configuration descriptors that included spatial aggregation metrics (COHESION, LSI, PLADJ), distances to land cover classes, area, elevation, and feature heights. *refers to methodology steps described in detail in Zawadzka et al. (2020b)

Land cover subtype patches were derived separately for each land cover class (buildings, grass, trees, paved and water) by a two-tiered k-means clustering process. In Tier 1, clustering of selected class-level landscape metrics (McGarigal and Marks 1995) related to spatial aggregation of land cover: COHESION, PLADJ and LSI yielded subdivisions of the main land cover classes with distinct spatial and thermal properties (Table S3 Supplementary Materials). The three landscape metrics were selected from a larger pool of various shape and aggregation metrics based on the strength of correlations with LST data within each land cover class (Zawadzka et al. 2020b) and are rarely used in this type of studies. Thus formed land cover type patches were attributed with means of LST for June and subsequently for July, and further subdivided through Tier 2 k-means clustering to determine these land cover subtype patches that had the highest, intermediary, and lowest LST at both time steps.

Due to a vast number of individual cluster patches across the three towns (circa 2 mln) subsampling at a rate of 10% of the total number of each Tier 2 clusters was necessary, and was implemented through conditioned Latin hypercube method (cLHS) (Minasny and McBratney 2006) available in ‘clhs’ R package (Roudier 2011). The cLHS method allowed for the creation of a representative sample of all spatial configuration and thermal properties of land cover subtype patches taken into account in this study, which included spatial aggregation metrics, elevation, height, area, and distance to land cover patches of different types (Table 1).

Table 1 Type, categories and groups of metrics used to mathematically represent the conceptual model of urban form as well as summarise the outcomes of LST modelling

Quantification of the impact of spatial configuration of urban form on LST of land cover subtype patches required that the spatial configuration properties of land cover patches located in the neighbourhood of the core land cover patches were known. This was achieved by deriving 10 buffer zones of varied sizes (starting at 10 m and ending at 100 m, every 10 m) around a subsample of each Tier 2 cluster, except for water (Fig. 3). Spatial join was carried out between feature classes representing maps of Tier 1 clusters and buffer zones to identify those Tier 1 clusters that intersected with each buffer zone. All GIS operations were implemented in ArcMap 10.6. Since the unique ID number of the core Tier 1 patches was known, it was then possible to link the spatial properties of the core patches to the properties of patches in their neighbourhoods in a one-to-many relationship. This table was subsequently transformed so that each core land cover cluster patch was attributed with sums or means of properties of land cover type patches in the neighbourhood, depending on descriptor type (Table 1).

Fig. 3
figure 3

Conceptual model of urban form implemented in this study to elucidate the relationship between LST of core LC patches of different types and spatial configuration of urban form, represented here by real data for a location in Milton Keynes

The most effective SCDs of urban form for explanation of LST of land cover patches in June and separately in July were identified through Random Forests (RF) models (Breiman 2001) implemented in the ‘ranger’ R package (Wright and Ziegler 2017). Random forests construct multiple decision trees using a random subset of available variables to solve classification or regression problems, have the capacity to independently assess the error of predictions through random exclusion of data points from training of each tree for testing as well as determine the order of importance of predictors. RF regression models were constructed for all clusters in each land cover class (RFALL), except for water, and separately for each land cover type (Tier 1 cluster) (LA—least aggregated, RLA—relatively less aggregated, RMA—relatively more aggregated and MA—most aggregated), allowing for the determination of the specific conditions for explanation of LST in core patches with different spatial properties. RF models, each containing 500 trees to ensure stability of predictions, were constructed with inclusion of the three subsets of SCDs: (1) SCDs of core land cover patches only, (2) core land cover patches and land cover patches intersecting with the buffer zones of different sizes, (3) land cover patches intersecting with the buffer zones only. Out of bag R2 and RMSE metrics were used to determine the predictive power of the models, the size of the neighbourhoods with most significant impact on LST in the core land cover patches, and the spatial resolution of LST imagery (2 m vs 100 m) yielding more accurate results.

The most important SCDs of urban form influencing LST of land cover patches were identified through the calculation of the percentage of the total variance explained by a given model that could be attributed to each predictor. Given a very large number of variables taken into account (up to 66 for models constructed with the properties of the core patches and patches intersecting with the buffer zones), these were grouped together hierarchically, as shown in Table 1, to further facilitate the discussion of the results in a structured and informative manner. The groups comprised up to three levels: predictor type, i.e. properties related to patch area, aggregation, distance to other patches, elevation, and height without specification of location (core vs neighbourhood) nor land cover class they referred to, predictor category, i.e. distinction between the aforementioned properties between core patches and patches intersecting with a buffer zone, and group, naming the land cover class that the predictor type and category referred to. The purpose of the predictor type grouping was to assess which type of SCDs had the greatest influence on LST of the core land cover patches, predictor category indicated whether these descriptors pertained to the core patches themselves or to the patches situated in their neighbourhood, and predictor group identified land cover classes that these SCDs referred to. Full list of SCDs considered in this study is given in Table S2 in Supplementary Materials).

Once the most important predictors of LST were identified, their values were analysed to determine specific spatial configuration conditions for the formation of the coldest (C), medium-cold (M-C), medium-hot (M-H) and hottest (H) Tier 2 clusters (Table S4 Supplementary Materials) derived for June and July within each lands cover subtype (Tier 1 cluster), with particular emphasis on the coldest and hottest land cover patches. This was achieved through pairwise Wilcoxon ANOVA, analysis of basic statistics (min, max, mean, median, 25th and 75th quantile) as well as visual comparison through boxplots. These analyses were carried out in R, using ‘stats’, ‘psych’ and ‘ggplot2’ packages respectively (R core team, Revelle (2019), Wickham (2016)).


Impact of core land cover class and type on LST

The impact of the core patch land cover class and type on the ease of prediction of LST using the conceptual model of spatial configuration of urban form implemented in this study was assessed by the R2 and RMSE model performance metrics for RFc+bf10m models predicting LST at 2 m spatial resolution (Fig. 4) under an assumption that better model performance is indicative of better explanatory capacity of our approach to LST of given land cover class and type. Whilst average R2 values for RFLA-RLA-RMA-MA models, excluding ALL to avoid bias resulting from different numbers of land cover subtype patches incorporated in each model, for each land cover class did not differ greatly between different land cover types (0.84–0.86 in June and 0.87–0.90 in July), RMSE had a wider spread of values. LST of buildings was best explained in our analysis, having the lowest overall RMSE of 0.78 K in June and 0.76 K in July. LST of grass was least accurately explained (RMSE of 1.04 K and 1.00 K respectively). RF models constructed for LST of paved (0.93 K and 0.89 K) and trees (0.97 K and 0.94 K) had the intermediate levels of accuracy, indicating that LST of built-up rather than green spaces was better represented in our approach.

Fig. 4
figure 4

Root mean square error (a) and R2 (b) obtained from RF models relating LST at two dates (June and July 2013) and spatial resolutions (2 m and 100 m) to spatial configuration descriptors for all patches of a given LC class (ALL) and separately for LC patches contained within Tier 1 clusters (LA least aggregated, RLA relatively less aggregated, RMA relatively more aggregated, MA most aggregated). ‘Core’ refers to models constructed with spatial configuration descriptors for core patches only, whilst 10 m, etc., indicate models with addition of patches intersecting with consecutive zones around the core patches

Accuracy of predictions of LST within land cover subtypes differed with overall aggregation level of land cover patches and displayed a common trend such that small and fragmented land cover (LA) patches had a lower RMSE and higher R2 than largest and most aggregated (MA) ones. An exception to this rule was the LA subtype in paved representing mostly footpaths located near greenspaces, with LST prediction error being higher than that of residential roads (RLA), likely caused by a difficulty of accurately representing these narrow and elongated features at 2 m spatial resolution.

LST and scale effects

The relationship between LST of various land cover classes and types and spatial configuration of urban form was considered in the context of multiple scale effects: (1) the spatial resolution of LST imagery, (2) the distance over which spatial configuration of land cover impacts the LST of the core patch, and (3) temporal scales.

The difference between RMSE errors (Fig. 4) for RF models predicting LST at 2 m and 100 m spatial resolution in June varied in magnitude depending on land cover class and type, and neighbourhood size. In all cases in June, RMSE for LST at 2 m resolution in buildings was lower by 0.05 up to 0.77 K and tended to decrease with increasing neighbourhood size. For the remaining land cover classes (grass, trees and paved), the neighbourhood size influenced the sign of the difference between RMSE at both spatial resolutions – predictions of LST at 2 m resolution tended to be more accurate with inclusion of spatial configuration properties of land cover patches intersecting with zones up to 30 m to 60 m away from core land cover patches, depending on the type of core land cover patch. Here, the predictions at 2 m resolution were more accurate by 0.43 to 0.02 K at smaller neighbourhood sizes and predictions at 100 m resolution were more accurate by 0.07 to 0.18 K at larger neighbourhoods. Trends observed for predictions in July were very similar and involved improvement of predictions for LST at 100 m over 2 m resolution with increasing neighbourhood size.

Model performance metrics (R2 and RMSE, Fig. 4) indicated that in all cases inclusion of spatial properties of land cover patches located in the neighbourhood improved the predicting power of core patch LST. In fact, models constructed without spatial properties of the core land cover patches performed comparably to the equivalent models for which spatial properties of core patches were included (Figure S1 Supplementary Materials), suggesting that the surroundings of individual land cover patches play a pivotal role in LST regulation. In all cases, the sharpest increase in R2 and reduction in RMSE occurred at the inclusion of the spatial configuration properties of land cover patches intersecting with the 10 m buffer zone, and the incremental improvement due to increases of neighbourhood size that followed was negligible at 2 m resolution, and more pronounced at 100 m resolution. This suggests that the most immediate surroundings have the largest impact on the LST of land cover patches and consequently, further discussion of the most important SCDs will be based on data for models constructed for the 10 m neighbourhood.

The magnitudes of R2 returned by models predicting LST of core land cover patches using spatial configuration descriptors of core patches and patches in the 10 m neighbourhood in June and July showed that models constructed with data for July explained comparable but higher amounts of LST variance, with R2 ranging between 0.75 to 0.90 in June and 0.75 to 0.94 in July at 2 m spatial resolution. Comparable RMSE were also achieved – 0.64 to 1.40 K and 0.64 to 1.34 K respectively, suggesting that as far as the overall impact of spatial configuration of urban form on LST of individual land cover patches is concerned, the relationships are maintained at two time steps a month apart during a warm (non-heatwave) summer in an intermediate climate.

Relationship between spatial configuration of urban form and LST of land cover patches

Spatial configuration predictors determining LST in June and July

Analysis of the percentage of the total variance explained by RFc+bf10m models by SCDs grouped into the main predictor categories (Fig. 5) shows that spatial properties of land cover patches intersecting with the 10 m buffer zone had a greater LST explanatory power than the properties of the core patches. Out of these, elevation, aggregation and distances to other land cover classes had the best explanatory power, with order alternating somewhat between land cover class, type and date, with elevation gaining distinctly in importance as summer progressed. Core patch height and area, on the other hand, had consistently the lowest explanatory power of LST, with the remaining categories of predictors having intermediary impact, which was still distinctly lower than that of the top three predictor categories.

Fig. 5
figure 5

Percentage of total variance of LST in (a) buildings, (b) grass, (c) paved, (d) trees explained by RF models attributed to the main LST predictor categories (Table 1) in June and July at 2 m spatial resolution. Predictors are sorted by the decreasing mean percentage of total variance explained for the two dates

Predictor groups name the specific land cover classes for which a given predictor category was derived (Figures S2a and S2b Supplementary Materials). In buildings, spatial aggregation of all land cover patches apart from water, and especially of neighbouring trees, were the most important LST predictors after elevation. Distance to water followed by the distance to buildings were important as well. Whilst feature heights were generally of lesser importance, in the case of LA buildings heights of buildings and trees located in the immediate proximity to the core buildings stood out as more important when compared to other buildings types (Figure S3 Supplementary Materials).

Apart from elevation, aggregation of trees and distance to buildings of neighbouring land cover patches were the most important LST descriptors in grass, with exception of the LA grass patches for which distance to buildings was less important than aggregation of paved patches or distance to water, the latter two being important for explanation of LST in all types of grass patches as well (Figure S4 Supplementary Materials).In paved land cover patches, besides elevation, distance of neighbouring land cover classes to buildings and aggregation of trees were important SCDs, with the former being more important for the LA and MA patches and the latter for the RLA and RMA patches. Here, other important factors included distance to water as well as aggregation of neighbouring grass patches and buildings (Figure S5 Supplementary Materials).

Apart from elevation, distance to water and distance to buildings were important LST predictors for patches of trees, with aggregation of other land cover classes remaining quite important. Order of importance varied somewhat between tree patches’ types, with distance to building being more important for the RMA and MA whilst aggregation of grass for the LA and RLA tree patches (Figure S6 Supplementary Materials).

Spatial configuration of urban form conducive to the formation of coldest and hottest land cover patches

Spatial configuration patterns of urban form conducive to the formation of coldest and hottest land cover patch subtypes, with LST differences ranging from 3.9 to 6.6 °C (Table 2), within a given land cover type (Tier 1 clusters) in June and July were determined through identification of LST predictors with statistically significant means within each Tier 2 cluster, via the ANOVA analysis, as well as non-overlapping ranges between first and third quartiles for the coldest and hottest clusters (Figures S7-S37 Supplementary Materials). Whilst selected ranges are shown in Table, means and standard deviations as well as results of the ANOVA analysis are shown in Table S1 and Figures S7-S37 Supplementary Materials.

Table 2 LST [oC] Mean and standard deviation (in brackets) of Tier 2 subdivisions of land cover types (Tier 1 clusters) in June and July acquired from 2 m resolution images, including the LST difference between the hottest and the coldest land cover patches (H min C)

Elevation was an important discerning factor of the hottest clusters, which were typically located above 112-152 m a.s.l. depending on land cover class and type. Coldest patches of buildings, grass and paved were associated with highly aggregated patches of trees intersecting with the 10 m buffer zone, with PLADJ greater than 73 to 85% and COHESION greater than 93 to 97%, and buildings requiring somewhat lower aggregation levels than grass or paved. Hottest patches of these land cover classes were associated with PLADJ smaller than 63–69% and COHESION smaller than 83–87%. Aggregation level of grass patches in the buffer zone was a discerning factor of the coldest patches of LA and RLA and coldest and hottest RMA and MA patches of trees. The coldest tree patches were located next to highly aggregated grass patches with PLADJ greater than 67–84% and COHESION greater than 84–94%. Hottest RMA and MA tree patches were associated with less aggregated patches of grass, with PLADJ of less than 53–62% and COHESION less than 70–80%. Aggregation level of paved patches, with some exceptions, was associated with the formation of the coldest and hottest patches of buildings, grass, and trees, with coldest patches of these land cover classes being associated with PLADJ smaller than 69–79% and hottest patches with PLADJ greater than 77–87%. In all cases, an increasing trend in aggregation of trees, grass or paved associated with the coldest and hottest patches was observed as the aggregation level of core patches increased, and no major differences between months were observed. Hottest and coldest building patches of a given type could also be discerned based on aggregation level of core buildings and buildings located in the buffer zone, with LSI associated with the coldest buildings indicating a higher aggregation level than that of the hottest buildings. Aggregation level of buildings in the 10 m buffer zone was a discerning factor for coldest and hottest clusters of paved patches, with coldest patches being associated with more aggregated buildings indicated by LSI smaller than 3.2–4.5 and hottest patches—with less aggregated buildings, with LSI greater than 3.8–6.2. Visual examples of spatial patterns of land cover conducive to the formation of hottest and coldest buildings are shown in Fig. 6.

Fig. 6
figure 6

Examples of spatial configuration of trees, paved and buildings associated with the formation of the coldest (a) and hottest (b) buildings of different subtypes: LA least aggregated, RLA relatively less aggregated, and MA most aggregated

When distance to water is concerned, the coldest clusters of all land cover patches classes and types were associated with a closer proximity to water bodies than the hottest ones, ranging from less than 46–166 m and more than 399-676 m respectively. More aggregated types of land cover patches were typically associated with a closer proximity to water, both for the coldest and hottest patches, than the less aggregated ones. Distances to buildings were also helpful in discerning the coldest patches of grass, paved and trees, which were formed farther away from buildings, and the distance increased with increasing aggregation level of these land cover classes, ranging from 8 to 86 m. Distance to buildings of less than 6-16 m could only be used to discern the hottest MA patches of grass, trees and paved (Table 3)

Table 3 Ranges of selected most important descriptors of spatial configuration of urban form attributed to the coldest and hottest LC patches of different subtypes


Methods in data preparation and analysis

This study represents a unique approach to analysis of the relationship between LST and urban form by attempting to explain LST, through analysis of spatial configuration of urban form, of individual land cover patches rather than the LST of variously defined sub-divisions, often referred to as analytical units, of a town as is the case in similar studies, e.g. Zhou et al. (2011); Kong et al. (2014); Liu et al. (2016); Simwanda et al. (2019); Masoudi et al. (2019). This was made possible through the availability of downscaled LST imagery (Zawadzka et al. 2020a) to a resolution better aligned with sharp and complex land cover boundaries typical of urban areas and consequently reducing the mixed pixel effect (Yow 2007) between contrasting thermal responses of adjacent land cover classes.

Whilst the use of analytical units in other studies, e.g. 900 m blocks in Berger et al. (2017), was in part necessitated by the need to reduce the computational requirements for the analysis, we used the cLHS method to reduce the sample size without compromising the robustness of the outcomes. cLHS method analyses the feature space of a dataset to include observations at the full range of all variables, and was successfully applied to optimise sampling design in digital soil mapping, including soil modelling with random forests (Wadoux et al. 2019), and LiDAR cloud data processing for accurate DEM generation (Chu et al. 2014).

Our data did not exhibit strong linear relationships between LST of core land cover patches and SCDs and as a result we used random forests models, capable of finding non-linear relationships in large non-normally distributed datasets, to identify best descriptors for LST of core land cover patches, followed by an analysis of means and quartiles to determine values of these SCDs contributing to a particular thermal effect in core land cover patches. This is in contrast to reported methods in other studies, where correlation and linear regression were typically adopted (Wang et al. 2017; Masoudi et al. 2019) with a few exceptions, such as random forests (Gage and Cooper 2017; Lemus-Canovas et al. 2020) or spatial regression models (Yin et al. 2018).

The focus of this study was set not only on LST of land cover patches of a given class but also type, defined by the aggregation level of patches of a given class, which in most cases could be associated with different functional imprints within the study area, and allowing for bottom-up considerations regarding the relationship between urban form patterns and LST depending on predominant land use. Literature lists several other studies that have attempted to explain LST means within different functional units of cities by coupling with urban form configuration metrics, such as for example Beijing city transects aggregated into specific functional zones (Li et al. 2020b), different types of parks (Li et al. 2020a), and regulatory plan management units (Yin et al. 2018), allowing for a top-down analysis of the relationships. We propose that the bottom-up approach adopted here can provide complementary insights into spatial arrangement of urban land cover under various uses for effective excess urban heat alleviation and microclimate management by exploring urban form detail that can be missed when descriptors of heterogeneous urban form patterns and thermal responses are averaged over larger parcels of land.

Relationships between LST and spatial configuration descriptors

Slopes of curves depicting RMSE of random forest models predicting LST of core land cover patches vs increasing size of buffer zones drawn around them displayed the highest enhancement in accuracy at the 10 m mark for LST data at both 2 m and 100 m resolutions, nearly levelling off for the former and continuing to drop for the latter, without a clear levelling-off effect at the maximum buffer zone size of 100 m considered here. The effect for 100 m resolution LST data is consistent with continuously increasing correlation coefficients between Landsat-8 derived LST and landscape metrics at neighbourhood sizes even beyond 1000 m (Masoudi et al. 2019) and in line with the major range of variograms for 100 m LST data in the three towns of 900-1100 m (Table S3 Supplementary Materials). Whilst variograms constructed for LST data at 2 m resolution levelled off at 250-450 m marks, the highest impact of spatial patterns of urban form on core patches’ LST is had at much greater proximity. Since in our analysis properties of entire patches intersecting with a particular buffer zone, even if they expanded beyond its boundary, were taken into account, the actual zone of immediate impact is likely approximating the 50 m block size recommended for urban design in the context of temperature regulation by Bartesaghi-Koc et al. (2019) or the maximum value of 30 to 50 m distance after which the cooling effect of urban parks on air temperature was undetectable (Takebayashi 2017).

Out of the available pool of landscape metrics, we utilised only a small subset of class aggregation metrics (COHESION, LSI and PLADJ) and only one patch-level metric, i.e. area, excluding a whole range of aggregation and shape metrics widely used in other studies. Inclusion of these class aggregation metrics was justified by unstable correlations between LST and various shape metrics derived at 2 m pixel level as well as redundancy or low correlations with other available aggregation metrics (Zawadzka et al. 2020b) for different land cover classes across the three towns. From aggregation metrics, PLADJ, COHESION of trees, grass, and paved, and LSI of buildings were more relevant for the explanation of core patches’ LST. Should any of these three metrics be used in other studies, they were typically considered as less important due to lower correlations with the mean LST of analytical units.

Contrary to previous studies, e.g. Zhou et al. (2011) and Jenerette et al. (2016), area of the core and neighbouring patches were one of the least important LST predictors in the analysis carried out for the 10 m buffer zone and 2 m LST data, which is likely due to the relatively small zone of influence as well as the very fine spatial resolution of both LST and land cover data considered here. Areas of buildings, trees and paved patches located in the neighbourhood gained somewhat in importance, but not exceeded the importance of patch aggregation metrics, for models constructed for the 100 m buffer zone and LST at 2 m. Area of buildings located in the 100 m buffer zone exceeded the importance of aggregation metrics for LST at 100 m resolution in models predicting LST of all land cover classes (data not shown), indicating a declining sensitivity of the analysis to more refined SCDs at coarsening spatial resolutions of the input LST data. This behaviour is consistent with observed scale effects governing the relationship between landscape metrics and LST (Kong et al. 2014), and confirms the importance of consideration of spatial configuration metrics of land cover patches (Li et al. 2012), such as their spatial aggregation, at micro-scales.

Elevation was the strongest exploratory factor of LST that has not been used in other urban LST studies. Whilst the cooling impact of increasing elevation on air temperature is well-known, we detected an opposite outcome whereby higher grounds exhibited higher LST, and the effect was exacerbated over the duration of summer. Whilst lower elevations could be related to a higher proximity to water bodies exerting a cooling effect, locations on higher grounds could potentially be exposed to more incoming solar radiation. A decrease in elevation in July associated with hotter land cover patches could possibly be explained by the decreased humidity of the ground and air, as compared to June.

Heights of buildings and trees in our study had only a complementary impact on LST to other metrics, which is in line with findings of Berger et al. (2017), where these two descriptors were less strongly correlated with LST than 2-dimensional metrics, such as impervious surface area or vegetation fraction, but contradicts the findings of Gage and Cooper (2017) and Sun et al. (2020a, b), where height of trees was one of the most important LST predictors in random forests models constructed for areas with specific land composition patterns within a suburban town. Our study, due to detecting varied importance of tree and building heights to LST of different land cover classes and types, provides additional insights to the impacts of the vertical structure of urban form on LST. Exclusion of elevation data, which could have been conflated with feature heights in our study, from RF models did not improve the importance of heights in LST prediction.

In our study, distance to water was the most important distance-related descriptor of LST, with land cover patches located nearer to water being cooler, just as in the case of the distance to sea in Barcelona in the summer or rivers (Lemus-Canovas et al. 2020). It has to be noted that water coming from anthropogenic sources, such as industrial outlets, may act as a heat source rather than sink (Wu et al. 2014), and may have a warming effect in colder seasons of the year, as demonstrated in the Barcelona case study, or have lower cooling capacity in the summer than spring due to warming up of water, amongst other factors (Hathway and Sharples 2012).

Distances to buildings, either of core patches or patches of grass, paved or trees in the neighbourhood were an important explanatory factor of LST, which can be related to the interactions of temperatures of different land cover types located near each other, previously explored in the context of cooling by urban greenspaces (e.g. Chen and Wong 2006; Lin et al. 2015). In particular, larger distance away from buildings was associated with the formation of cold patches, which could be explained by increased radiative heat fluxes from walls of the buildings that were shown to be warmer than rooftops in off-nadir thermal images (Voogt and Oke 2003), affecting LST of their neighbours. Conversely, a less distinct impact of distances to paved areas on LST of various land cover patches could be due to these areas missing heat-accumulating vertical surfaces and therefore exerting a lesser impact on LST of neighbouring land cover than buildings.

Our results indicated that, generally, higher spatial aggregation of patches of trees and lower aggregation of paved were associated with the coldest core land cover patches of different types and subtypes, with the opposite being true for the hottest patches, with specific thresholds being fairly similar at both dates considered here, set a month apart. Nevertheless, depending on an SCD, there was a substantial overlap between patches of contrasting thermal properties, enforcing the use of quartiles (25th and 75th percentile) as more reliably distinguishing between LST of land cover patches than LST means. The issue of thresholds was raised by Masoudi et al. (2019) who argued that it is impossible to state a definite value for minimum vegetation cover within an analytical unit due to instable results obtained across 14 years in Singapore, attributing the differences to the “artefacts of current situation”. Moreover, other multi-temporal studies found that correlations of spatial configuration metrics and LST varied with season and year (Liu et al. 2016). Whilst it is difficult to establish specific thresholds for SCDs yielding a particular thermal effect, our study has demonstrated that a certain threshold exists beyond which increased fragmentation of urban form, and especially tree cover, is unlikely to yield cooling effects towards neighbouring land cover patches. This statement, however, excludes cooling from trees through shading, which could not be explicitly quantified in our experimental setup, but is an important heat mitigation measure at a street level (Aleksandrowicz et al. 2017).

Whilst general consensus exists that greenspaces contribute to cooling of urban areas through evapotranspiration and shading, and corresponding reduced heat storage capacity as compared to built-up spaces, our study has shown that grass had a more important cooling effect in June than July, suggesting a reduction in its temperature regulation capacity as summer progresses, likely due to decreased soil moisture and consequent reduced evaporation capacity of unirrigated grass patches. The use of trees for cooling can also have varied effects depending on climate, as demonstrated in a comparative study (Zhou et al. 2011), where high edge density of trees was more important for cooling in the city of Sacramento, characterised by hot and dry summer, than Baltimore, with hot and humid summer, where increased fragmentation was detrimental to cooling. The results of our study, associating well-aggregated patches of trees to the coldest land cover patches, is consistent with the results obtained for Baltimore. Moreover, the tree community structure including not only tree density and canopy dimensions but also species diversity were found to influence the cooling capacity of urban greenspaces (Wang et al. 2021). Another consideration that is important for greenspace planning is that optimisation of spatial arrangement for day-time heat mitigation may diminish cooling at night, as shown in Zhang et al. (2017), through blocking part of the sky view by trees and consequent reduction in long-wave radiative cooling.

So far, this study has addressed the question of the types of spatial configuration descriptors of urban form and their values associated with the formation of the coldest and hottest patches of various land cover classes at microscales without explicitly investigating interactions between them. The presented analysis was necessarily simplified by averaging the values of SCDs for each land cover class present within the buffer zones to show overall trends affecting the LST of core patches, potentially overlooking specific effects of spatial configuration of land cover on LST in areas characterised with very high spatial heterogeneity. Whilst at the small size of the patch neighbourhood (10 m buffer zone) considered here urban form was likely relatively homogenous, our analysis could inform compilation of a dataset simultaneously describing several properties of land cover patches and their neighbours in a qualitative manner, allowing for answering more specific questions relevant to the design of urban form. The pool of spatial configuration descriptors used in our study could be expanded by incorporation of information regarding community structure and species diversity of tree patches, which were shown to be important (Wang et al. 2021) for cooling as well as descriptors of heat storage capacity of construction materials that can be approximated by albedo (Phelan et al. 2015).


Our study set out to determine whether it is possible to accurately explain LST of individual land cover patches with the use of fine resolution LST imagery and spatial configuration descriptors of urban form defined as spatial aggregation of land cover patches, their area and heights, distances away from land cover patches of different types, and elevation. We found that spatial properties of urban form located in the immediate proximity to a land cover patch of any type had a greater influence on the LST of the patch that the spatial properties of the patch itself, confirming that appropriate urban form design can be used for temperature regulation in urban environments. LST of less aggregated land cover patches of each type could be more effectively influenced by appropriate spatial configuration of urban form than LST of more aggregated and therefore often larger patches. Elevation followed by the aggregation level or distances to water or buildings were the most important descriptors of LST. The coldest land cover patches were situated at relatively low elevations, in a closer proximity to water, more aggregated patches of trees and less aggregated patches of paved, with the opposite trends being true for the hottest land cover patches of each type and subtype. As summer progressed, elevation and distance to water gained in importance in LST regulation over other factors, including aggregation of grass, which exhibited stronger LST regulatory properties in June than July. Whilst spatial configuration descriptors used in this study were capable of predicting LST of core patches at a coarser spatial resolution, the accuracy of prediction was lower for buildings as well as more aggregated patches of all types than when high spatial resolution LST was used, suggesting that fine resolution LST data are required for LST studies at micro-scales. Future work should focus on elucidating the relationships between LST and spatial configuration of urban form after controlling for the effects of elevation and distance to water as well as include descriptors related to urban fabric, such as albedo and tree community structure, which could improve the accuracy of the assessment, and provide further insights to urban plans aimed at improved thermal comfort in towns and cities.