Recent decades have seen a rise in research (Wu and Ren 2019) regarding spatial configuration of urban form and its relationship to urban heat island (UHI) (Oke 1976) or surface urban heat island (SUHI) (Bärring et al. 1985) effects, deriving from concerns over climate change impacts on increased incidence of heatwaves (Perkins et al. 2012; Wouters et al. 2017) and related negative impacts on human health (Lin et al. 2009; Basara et al. 2010; Milojevic et al. 2011; Heaviside et al. 2016, 2017), among others, additionally aggravated by urban growth (Chapman et al. 2017; United Nations 2019).

The impact of urban form on UHI is often described through direct measurements of air temperature across different urban gradients (Schwarz et al. 2012; Lin et al. 2019) or through street-scale simulations (Sodoudi et al. 2018; Ramyar et al. 2019) allowing for micro-scale assessments. Such studies, however, take into account only a relatively small sample of observations and may not fully capture specific site effects elsewhere (Romero Rodríguez et al. 2020). On the contrary, the relationship of urban form and the SUHI effect is typically investigated from remotely sensed land surface temperature (LST) imagery at medium (30 m) to very coarse (1 km) spatial resolutions, offering an opportunity for city-wide assessments, however, compromising applicability of the results to micro-scales by summarising the results over larger subdivisions of land (Zhou et al. 2011, 2020; Kong et al. 2014; Liu et al. 2016; Simwanda et al. 2019; Masoudi et al. 2019). These studies commonly use landscape metrics (McGarigal 2015), pertaining to the field of landscape ecology, to elucidate the relationships between urban form and LST, and recommend deriving them from fine resolution land cover (LC) maps when the relationships are the strongest (Li et al. 2013). Use of medium to coarse resolution LST imagery within artificially superimposed land divisions allows for neighbourhood to district-scale assessments whose aggregated character may lack in detail specific to urban design conducive to thermal comfort outdoors (Perini et al. 2017; Li et al. 2020) or within building interiors (Futcher et al. 2013; Garshasbi et al. 2020).

We present a methodology that utilises very fine spatial resolution LC maps and selected class-level landscape metrics to generate a LC patch typology suitable for accurately depicting LST at a fine spatial resolution in three British towns. The LC patch typology is intended at facilitating urban design process by determining likely thermal responses of individual LC patches with specific spatial properties as well as support studies of urban thermal patterns associated with urban form. We verify the distinctiveness of the obtained LC patch typology by comparison to fine and medium resolution LST maps representative of two summer days a month apart as well as independent spatial configuration descriptors.

Materials and methods

Study area

The study area comprises three towns located in relatively close proximity in England: Milton Keynes (52° 0′ N, 0° 47′ W, appr. 122 km2), Bedford (52° 8′ N, 0° 27′ W, appr. 60 km2), and Luton/Dunstable (51° 52′ N, 0° 25′ W, appr. 86 km2) (Fig. 1) with population of 229,941, 106,940, and 258,018 (Office for National Statistics (2013) respectively and a temperate oceanic climate according to the Köppen–Geiger climate classification system. The three towns are characterised with contrasting histories: modern-day garden-city, medieval, and industrial, respectively, collectively representing a wide range of urban form patterns (Grafius et al. 2016; Zawadzka et al. 2019).

Fig. 1
figure 1

Land cover in A—Milton Keynes, B—Bedford, C—Luton/Dunstable. The insert depicts location of the towns within Great Britain. Analyses were carried out for areas within the ‘Built-up Area Extent’ boundary


This study required the use of land surface temperature (LST), land cover (LC) and feature height data for the three study areas. LST images were derived from Landsat 8 TIR bands using the split window algorithm as described in Jimenez-Munoz et al. (2014) for two summer dates: 6 June and 8 July 2013. Availability of cloudless images captured a month apart allowed for the assessment of the relationship between urban form patterns and LST over the course of warming summer.

A LC map was derived from NDVI generated from Colour-Infrared aerial imagery obtained from LandMap Spatial Discovery ( and British Ordnance Survey MasterMap, originally at 0.5 m spatial resolution (Grafius et al. 2016) and resampled with the nearest neighbour method to 2 m spatial resolution to reduce the data volume as well as match spatial resolution with available elevation and LST datasets. Five types of land cover are shown: grass, trees, paved, buildings and water (Fig. 1). Importantly, the use of a detailed topographic map during LC map production process allowed for accurate depiction of the building footprints and road layouts, which are oftentimes obscured by overhanging tree canopies in cases where maps are generated solely form NDVI.

Finally, feature heights were available at 2 m resolution. These were created based on a NERC-ARSF Leica ALS50-II LiDAR survey conducted over the three towns (Grafius et al. 2016).


The primary goal of this study was to develop a simple method for generation of sub-divisions of LC patches suitable for studies of urban thermal environments at very local scales, comparable to individual or small groups of patches, with the use of the k-means clustering approach. This section describes the steps required to develop and verify the refined LC maps, which are summarised in Fig. 2.

Fig. 2
figure 2

Methodological approach for determination of urban fabric patterns with distinct spatial and thermal properties

LST downscaling

Landsat 8 LST maps for the three towns at original 30(100)m spatial resolution were downscaled to 2(4)m resolution using Multiresolution Adaptive Regression Splines method and ancillary data including spectral indices and green-grey infrastructure footprints, described in detail in Zawadzka et al. (2019). The mixed spatial resolution of the coarse LST imagery stems from the fact that Landsat 8 TIR bands are captured at 100 m and are subsequently resampled, using the bilinear convolution method, by data provider (USGS—United States Geological Survey). The spectral indices used in LST downscaling were derived from visible and near-infrared bands at 2 m and short-wave infrared bands at 4 m resolution, resulting in an intermediate information footprint.

Spatial configuration metrics

Spatial configuration metrics used in this study included class-level landscape metrics and distances of LC patches to other patches of different type. A range of class-level patch aggregation and shape metrics (Table S1, Supplementary Materials A) was derived with the use of the Fragstats 4.2 software (McGarigal et al. 2012) from 2 m spatial resolution LC maps available for Bedford, Luton and Milton Keynes. The choice to use class-level landscape metrics, which describe spatial properties of all patches belonging to a given LC type within a particular landscape, was justified by a couple of considerations. Firstly, patch-level metrics were discarded due to one of the fundamental reasons for conducting this study, i.e. the tendency of individual patches derived from raster maps of LC to comprise LC fragments of contrasting spatial properties, especially when LC classes are well or appear to be well connected across the landscape. Examples of such LC types within urban areas include roads and other paved areas, water, and to certain extent—trees or grass. Secondly, landscape-level metrics were inadequate for the purpose of this study looking at the refinement of existing LC patches, as they return results pertaining to the entire landscape that cannot be attributed to an individual LC type.

Each metric was calculated over landscape represented by moving windows of varied sizes (10 m to 100 m every 10 m and 100 m to 200 m every 20 m) using a 4-cell neighbourhood rule indicating that, as opposed to the 8-cell neighbourhood rule, two adjacent grid cells in the raster map are treated as connected when they share a side but not a corner (Fig. 3). Excluding grid cell corners from the connectivity rule allowed for discernment between small patches, such as individual trees, or other patches separated by very narrow strips of land not depicted at 2 m resolution of the LC map. Window-based analysis, by focusing on a small portion of the study area at a time, allowed for calculation of metrics for individual sections of LC features, making the analysis relevant to microscales presumed in this study. Given considerable computation times at very fine spatial resolution used in this study, the entire set of metrics listed was derived for Bedford, characterised with smallest extent and somewhat intermediary spatial properties of urban form patterns when compared to Milton Keynes or Luton, and only the metrics with the strongest relationships to LST at both 2 m and 100 m spatial resolutions were generated for the remaining towns.

Fig. 3
figure 3

Demonstration of the a moving window and b cell neighbourhood concepts used in generation of landscape metrics from input LC maps. In moving window analysis, each cell of the output raster is assigned a result of a function calculated from all cells located within a moving window sliding across the input raster. The cell neighbourhood rule determines whether LC patches sharing a corner will be viewed as two separate patches (4-cell rule) or as a single patch (8-cell rule) by the Fragstats software

Distances of a given LC patch to other LC patch types were derived in ArcGIS 10.5 using the Euclidean distance tool, and were stored as raster layers covering the extents of the three towns.

Metrics selection

Shape or aggregation class-level landscape metrics for each LC type calculated within moving windows of varied sizes in Bedford were compared to LST at 2(4)m and 30(100)m resolutions on a pixel-by-pixel basis using the Spearman rank correlation coefficient (Spearman 1904) rho. Rho compares data ranks rather than actual values of two continuous variables and is therefore less sensitive to outliers or non-normal distributions in either of the variables (Puth et al. 2015), as was the case for class-level metrics computed within small moving windows. Due to pixel-by-pixel comparisons between values of the landscape metrics, assigned to each 2 m grid cell of LC map, and LST we did not deem it necessary to average LST over equivalent window sizes under an assumption of spatial autocorrelation of LST values (Yin et al. 2018) that would capture any effects of spatial configuration of LC on LST. Despite the expectation that the associations between landscape metrics calculated within smaller window sizes (10 to 100 m) and LST at 2(4)m resolution would be more appropriate than with the coarser LST data, the inclusion of the latter in the correlation analysis allowed for the verification of the observed relationship patterns obtained for the downscaled LST images in different LC classes, especially in search windows over 100 m in size, indirectly assuring validity of the results at the finer resolution.

Determination of two-tiered urban fabric patterns

Patterns of urban form were determined separately for each major LC class (buildings, paved, grass, trees, and water) based on a two-tiered unsupervised k-means clustering analysis. This approach ensured (a) independent from LST depiction of LC sub-divisions and (b) unbiased determination of fragments of each urban form type with specific thermal properties. The unsupervised, data-driven approach not only helped avoid bias in the estimation of spatial and thermal properties of the new LC patches, but also had practical connotations by minimising the chance for potential omission of important or overestimation of unimportant LC sub-divisions when a supervised method is used.

In Tier 1, class-level landscape metrics with the strongest association to LST in each LC class were clustered with the k-means method implemented in R statistical software and scree plots representing the within-groups sum of squares (WSS) were used to determine the optimal number of clusters for each LC class, resulting in maximally homogenous patches in terms of their spatial properties.

In Tier 2, another k-means run was carried out to determine LC patches located within each of Tier 1 clusters with distinct LST. This required that individual LC patches belonging to each Tier 1 cluster were attributed with the mean value of LST in June at 2 m resolution using the Zonal Statistics as Table tool in ArcGIS 10.5. Again, the optimal number of clusters was determined from inspection of scree plots of WSS. The use of the mean LST rather than a range of values within each Tier 1 patch prevented splitting of individual Tier 1 patches into two or more Tier 2 clusters.


Distinctiveness of clusters obtained in both tiers of the analysis was verified with pairwise Wilcoxon ANOVA analysis (R software) based on LST, selected class-level landscape metrics, elevations, feature heights (buildings and trees only) and distances to other LC classes.


Associations between LST and class-level landscape metrics

Inspection of Spearman correlation values (p < 0.05) for selected class shape and aggregation metrics with LST within different LC classes revealed that aggregation (Fig. 4) and not shape metrics were consistently and more strongly correlated with LST, depending on LC and search window size used to calculate the metrics.

Fig. 4
figure 4

Spearman correlations between selected class aggregation metrics and LST for 6th June and 8th July at 2 m and 100 m spatial resolutions in various land cover types for Bedford

Class aggregation metrics with strongest correlations to LST included COHESION and PLADJ for all LC classes, except for water, and LSI for grass and trees. The correlations were stronger in June than July and comparable in magnitude between respective months at both spatial resolutions—2 and 100 m. Correlations tended to rise with increasing search window size, achieving the strongest constant value at approximately 100 m for 2 m and continuing to rise slowly beyond that size for 100 m resolution LST data.

At 100 m window size for 2 m LST in June, the strongest correlations were observed for greenspaces, with grass and trees being positively correlated with LSI (0.57 and 0.53) and negatively correlated with COHESION (− 0.59 and − 0.60) and PLADJ (− 0.62 and − 0.66). Correlations between COHESION and PLADJ and LST for built-up spaces and water were weaker: 0.42 and 0.36 for buildings, 0.36 and 0.31 for paved, and 0.17 and 0.1 for water, respectively.

The strongest correlations within class shape metrics were observed for CONTIG_MN, PARA_MN and SHAPE_MN (Fig. S1 in Supplementary Materials), however, here the window size with the strongest relationship was relatively small (~ 40 m) for greenspaces and large for built-up areas (~ 100 m). This inconsistency coupled with strong search window artefacts visible in the raster layers for shape metrics lead to their rejection as candidates in this study.

Spatial and thermal patterns of urban form

K-means clustering of three class aggregation metrics (COHESION, PLADJ, LSI) for grass and trees, and two class aggregation metrics (COHESION and PLADJ) for paved, buildings and water yielded spatially distinct patterns of urban form within each LC type (Figs. S1 and S2 in Supplementary Materials A). Each Tier 1 cluster could be attributed with distinct values of the class aggregation metrics, average distance to other LC classes, elevation, feature heights, and LST (Table 1, also Tables S2–S4, and Supplementary Materials A). ANOVA has shown that means of COHESION, LSI, PLADJ, and LST (except for one pair of T1 clusters in water) were significantly different (p < 0.001) for each pair of T1 cluster within each LC class. A great majority of cluster pairs had also significantly different distances to other LC types, with well-justified exemptions of distances of residential patches of trees to grass, and few others for water.

Table 1 Properties of Tier 1 clusters

Tier 2 clustering sub-divided each Tier 1 cluster into four thermal categories—coldest, hottest, and two intermediary classes: medium-cold and medium-hot, with statistically different June and July (2 m) LST means (Fig. S3 in Supplementary Materials A and Supplementary Materials B). ANOVA carried out on all other diagnostic variables implied that resulting Tier 2 clusters have largely been distinct not only thermally but also spatially, with exceptions that were most common in water and also occurring in buildings, and very rarely in the remaining LC types. Overall, the two-tiered unsupervised k-means clustering procedure was capable of generating a representation of urban fabric composed of five major LC types subdivided into clusters with distinct spatial and thermal properties (Fig. 5).

Fig. 5
figure 5

Examples of Tier 1 Clusters in a Buildings—B, b Paved—P, c Grass—G, d Trees—T, e Water—W. Arrows point to the Tier 1 Cluster intended for representation in each image tile. Legend is ordered according to decreasing patch aggregation levels


The urban LC patch typology developed in this study was intended at differentiating sub-divisions of main LC types relevant for urban thermal studies at micro-scales, i.e. areas 1–104 m2 in size, that are required for studies contributing to climate sustainability of urban design (Georgescu et al. 2015). Whilst micro-scale studies using simulation models of urban thermal environments exist (Perini et al. 2017; Sodoudi et al. 2018; Ramyar et al. 2019), they often utilise unrealistic models of urban form, resulting in crude estimates, (Li et al. 2020) that could be substituted by excerpts from the typology developed here. In fact, urban climatology is known for attempts to stratify urban form into morphological areas contributing to homogenous thermal responses, an example of which is given by the urban climate zones (UCZs) developed by Stewart and Oke (2012) and pertaining to neighbourhood scales. Our typology, which combines patch-level detail with city-scale thermal zoning, can support research aiming derivation of UCZs (Lee and Oh 2018; Xu et al. 2019) in an automated manner by extracting individual LC patches with spatial properties related to their LST, especially when additionally attributed with heights of buildings being one of the differentiating factors in the UCZ classification. Further practical implications include the opportunity created by this typology to carry out studies of the relationship between LST and urban form at scales relevant to outdoor comfort of pedestrians or in the interiors of buildings, taking into account interactions with neighbouring LC patches (Zawadzka et al., In Preparation).

During development of the urban LC typology presented here a number of shape and aggregation landscape metrics that had previously been used in studies pertaining to finding relationships between LST and urban form (Zhou et al. 2011; Li et al. 2011; Wu et al. 2014; Chen et al. 2014; Gage and Cooper 2017; Sodoudi et al. 2018) were tested for strong correlations with LST. Technical considerations of working with LC map in a raster format and the intention to automatically determine individual LC patches of each main LC type with unique spatial properties enforced a moving window analysis for calculation of the landscape metrics at LC class-level. The use of moving widows caused the possibility of inclusion of spatial properties of grid cells belonging to adjacent LC patches into the calculations related to the focal patch, which could lead to erroneous assignment of their spatial properties, exacerbated only in cases when adjacent LC patches had very contrasting properties and the search window was excessively large. This effect could be regarded as largely negligible given a certain level of spatial homogeneity of urban form due to planning of neighbourhoods (Cortie 1997).

Nevertheless, the LC typology was intended at stratification of urban form for use in studies of urban thermal environment at micro-scales, motivating the selection of both the type of metrics and window size most strongly correlated to LST at 2 m resolution. The correlation values pointed to highest suitability of the moving window 100 × 100 m in size for each LC class, which assured consistency of any subsequent analyses, however, could potentially be an artefact of the 100 m spatial resolution of the thermal infrared sensor mounted on the Landsat 8 satellite. The strength of correlation depended not only on search window size used in Fragstats calculations but also on LC and metric type. The correlations for aggregation metrics within LC classes with LST exhibiting a relationship with LST where strongest at 100 m search window size both for green and grey spaces, and at 40 to 80 m for selected shape metrics within greenspaces with varied effects for buildings and paved. Weaker correlations with water, especially with LST at 2 m resolution, could be attributed to the downscaling procedure applied to coarse resolution LST data not depicting the thermal response of water bodies correctly, especially for narrow elongated features easily affected by the mixed-pixel effect (Yow 2007). Effects of search window size on correlations with LST have not previously been investigated citywide and separately for each LC class within one study, potentially due to high computational demand of these calculations. Nevertheless, correlations for aggregation metrics with 100 m resolution LST still showing an increasing trend for windows 200 m in size indicated that larger window sizes are appropriate for coarser resolution LST data. Weakening of the correlations for LST in July when LST was on average 3.7 K higher is in concordance with Li et al. (2011) who observed significant correlations between landscape metrics and LST in spring rather than in summer, and suggests changes in LST regulatory capacity of urban form patterns as the temperatures rise.

The use of three types of class aggregation descriptors in the LC typology, COHESION, LSI, and PLADJ, allowed for sub-division of each LC type according to different perspectives, ensuring comprehensiveness of the approach (McGarigal 2015). COHESION is a measure of physical connectedness of a patch type expressed through the ratio of its perimeter to its area and the size of the landscape (i.e. search window), and as such focuses on the spatial properties of the focal patches, excluding the impact of their neighbours of the same type. PLADJ, on the other hand, analyses the landscape in search of adjacencies between patches of the same type and consequently relates their aggregation to the level of their fragmentation within a specified area. Here, the 4-cell neighbourhood rule used in the calculation of the metrics is pivotal in separating small, closely located patches of LC that should be treated as separate entities, such as individual trees. LSI complements COHESION and PLADJ by looking at the edge density of a LC class in the landscape and therefore relating the outcome to the shape of patches forming the class. Class-level landscape metrics used in this study are affected by sensitivities with regards to the size and aggregation of patches in the landscape (Neel et al. 2004). Changes in aggregation level described by PLADJ and COHESION may be difficult to distinguish from the change in patch size due to strong interactions between patch area and aggregation level within a landscape observed for these metrics, constituting a potential disadvantage depending on the requirements of subsequent studies. LSI has a tendency to display a parabolic relationship between patch size and aggregation level, however, not in natural landscapes, when the relationships are linear, i.e. higher LSI associated with lower patch area and aggregation level. This could also explain good correspondence of LSI of grass and trees to LST and not built and paved classes, which can be roughly characterised with high aggregation and low area or low aggregation and high area, respectively. From the pool of remaining class aggregation metrics considered in this study, AI had similar correlation values to PLADJ, however, its use was discarded due to a tendency to provide misleading estimates when area of the class in the landscape exceeds 50% and having similar meaning to PLADJ (Neel et al. 2004). CLUMPY and IJI had relatively high correlations with LST for LC classes representing greenspaces, however, CLUMPY is similar to PLADJ by considering grid cell adjacencies and IJI returns valid values only when there are at least three different classes in the considered landscape (McGarigal 2015).

The development of the LC typology presented in this study involved using pixel-based clustering techniques, which have rarely been used in studies relating landscape metrics to LST, with only Gage and Cooper (2017) having deployed hierarchical clustering to identify LC typologies within predefined parcels of land—5 ha hexagons—rather than subtypes of a given LC class as is the case in our study. In fact, this is the first known to the authors study attempting to sub-divide existing maps of LC into groups of patches with unique spatial configuration properties within a single LC class. The unsupervised k-means clustering approach was capable of discerning sub-divisions of LC in a manner convincing to the human eye that could be further subdivided into four thermally distinct subclasses in buildings, paved, grass and trees. Whilst hierarchical object-oriented approaches (e.g. Chen et al. 2009; Grippa et al. 2017) for LC classification could constitute an alternative way for generation of similar LC typologies, K-means clustering has the advantage of easy implementation with the use of any statistical software. Moreover, our approach combining pixel-based and moving window analyses allowed for consideration of entire patches of a given LC in the formation of the typology rather than their fragments trimmed by superimposed artificial land parcel boundaries.


Two-tiered unsupervised k-means clustering approach presented in this study was successful at depicting both spatially and thermally distinct subdivisions of major LC classes in medium sized towns relevant to studies of the relationship between LST and urban form patterns at very fine (2 m) spatial resolution. Whilst investigation of all effects of spatial configuration of urban form on the LST observed in Tier 2 clusters is still ongoing (Zawadzka et al. In Preparation), this study has revealed that relationships between class-level landscape metrics and 2 m resolution LST are strongest at smaller parcels of land than in the case of coarser resolution LST datasets investigated in other studies, and that these relationships weaken as the summer progresses. This study has also shown that aggregation (LSI, COHESION, PLADJ) and not shape metrics frequently used in other studies investigating relationships between urban form and LST are important for explanation of LST at fine spatial resolution. Correlations between the class aggregation metrics and LST, investigated as part of the secondary objective, were the strongest when a search window 100 × 100 m in size was used to derive them from raster LC maps and were stronger in vegetated than non-vegetated LC classes. This proved that consideration of the interactions between technical aspects of landscape metrics’ computation and LST is important for accurate depiction of urban form patterns with applications in urban thermal environment studies.