Abstract
Urban density is central to urban research and planning and can be defined in numerous ways. Most measures of urban density however are biased by arbitrary chosen spatial units at their denominator and ignore the relative location of elementary urban objects within those units. We solve these two problems by proposing a new graphbased density index which we apply to the case of buildings in Belgium. The method includes two main steps. First, a graphbased spatial descending hierarchical clustering (SDHC) delineates clusters of buildings with homogeneous interbuilding distances. A Moran scatterplot and a maximum Cook’s distance are used to prune the minimum spanning tree at each iteration of the SDHC. Second, within each cluster, the ratio of the number of buildings to the sum of interbuilding distances is calculated. This density of buildings is thus defined independently of the definition of any basic spatial unit and preserves the builtup topology, i.e. the relative position of buildings. The method is parsimonious in parameters and can easily be transferred to other punctual objects or extended to account for additional attributes.
1 Introduction
Density is the ratio of a mass—typically a number of individuals, jobs, or buildings—to a given twodimensional reference, i.e. an area. In urban and planning practices, different density metrics are used (Churchman 1999; Longley and Mesev 2002; Angel et al. 2021) to represent the functioning of a city and, more particularly, to hint at the sustainability or liveability of urban forms (Pauleit and Duhme 2000; Boyko and Cooper 2011; Ewing et al. 2018; Rinkinen et al. 2021; Martino et al. 2021). Density metrics, however, are subject to two major issues: they use a reference area which is not necessarily related to research goals, and they ignore the relative locations of spatial elements within that reference area, thus aggregate without making sure of the internal homogeneity of the area. These problems were already stressed within the axiomatic approach to geographical space of Beguin and Thisse (1979) who showed that the elementary area considered to measure a density cannot be separated from the metric of the relative location of places, namely the topology.
We draw from this inherent property of geographic space and from more recent work by Caruso et al. (2017) to propose a method that computes a topologybased density index. The idea is to compute the index without using a reference surface but a graph that connects spatial objects (points) via edges. By weighting the edges of the graph with the distance between points, we can obtain a graph that preserves the relative position of the points with respect to each other. One can then cut the graph in order to obtain groups of points whose relative distances are homogeneous, thus having a homogeneous base (topology) for computing an aggregate index such as density. We suggest a novel spatial descending hierarchical clustering (SDHC) method to cut the graph, where the most locally dissimilar edges are removed iteratively. Rather than an exogeneous cutoff, we use the Cooks statistics on Moran scatterplot’s regressions, thus directly using local topological information at each stage.
We focus here on the case where points are buildings. Buildings, together with plots and streets, are the elementary constituents of an urban space (Moudon 1997). We illustrate our method by proposing a buildings’ density index, which is a key index in urban planning. Hence, we move the measurement of the buildings’ density from an areabased problem to a graphbased problem. Our density metric preserves the relative location, i.e. the topology of the buildings.
We can see this research as a contribution to a strand of the urban literature where density is complemented by morphological indices (e.g. Galster et al. 2001; Berghauser Pont and Haupt 2005; Sémécurbe et al. 2019; Fleischmann 2019, 2021). Contrary to existing work in this domain, we emancipate from the definition of a reference spatial unit. We can also see our work as extending another line of research where topological graphs are used to capture the local spatial organisation of buildings (e.g. Caruso et al. 2017; Wu et al. 2018, for recent examples). Compared to this second strand, we move a step forward by proposing to use Moran’s scatterplots not only for describing but also for cutting the graph, and by adding a density measure.
In the next section, we position our work within broader urban morphology research and with regards to recent methods applied to graphs of buildings. We then present data inputs and the different steps of our methodology in (Sect. 3), then the results of an application to all buildings located in Belgium (Sect. 4). We discuss our findings in Sect. 5 and conclude in Sect. 6.
2 Background
2.1 Density metrics and the urban space
The interplay between forms, structures and processes is central to urban science and for understanding how cities are made up: from form we infer processes that create the structures we see in cities, thus enabling us to build models of these processes, that in turn will simulate forms (Batty 2013, p. 79). The physical part of the city, essentially buildings, roads and plots, reveals the presence of inhabitants, activities and hence movements (flows), which themselves relate to planning and normative issues, such as what is the ”good” city form (Lynch 1984) according to its given environmental context and use? How to create a sustainable city? How to transform a city to a more sustainable one? For a while, the idea was to reduce distances and increase density to save energy and space (Williams et al. 2000). Today, many studies and authors (see for example Berghauser Pont et al. 2021, for a review of the impacts of densification) bring nuance to this statement by showing that citycompaction and densification is not always the way to more sustainability. However, metrics of “densities” remain central in debates on how to deliver “better” cities at least in policy arena.
Two main approaches are used to study urban forms: the discrete object approach in discrete areas and the network approach (Berghauser Pont 2021). In the first approach, researchers measure morphometric characters such as size, shape, and intensity. (see Fleischmann et al. 2021, for a review) of discrete elements such as buildings, streets and plots in more or less complex discrete areas (see for example Berghauser Pont and Haupt 2005; de Bellefon et al. 2019; ArribasBel et al. 2019; GodoyShimizu et al. 2021; Fleischmann et al. 2021). The morphometric characters are then often further associated with the evaluation of the urban form in terms, for example, of liveability (Martino et al. 2021), waste production, traffic volume, water and energy consumption (Pauleit and Duhme 2000), heat island and the flow of air (Boyko and Cooper 2011), the urban vitality (Bobkova et al. 2017), etc. It is not merely the density of these structures that matters, however, but also their geometry at specific scales (Schirmer and Axhausen 2015). Batty (2013), p. 180 suggests that spatial interactions and the functioning of connections within cities “need to be physically rooted in the detailed geometry of buildings”. Similarly for (Gehl (1987), p. 83), building density “says nothing conclusive about whether human activities are adequately concentrated. The design of buildings in relation to relevant human dimensions is crucial”.
The network approach does not focus on discrete objects but on systems of objects. Researchers study particularly the street network which implies the study of network structure, connectivity, centrality, hierarchy, etc. (Marshall et al. 2018). The Space Syntax movement, initiated by Hillier’s seminal work (Hillier 1996), is one of the precursors of this approach with attention given to the relative position of lines, while more recent publications rather focus on ubiquity across cities and the massive use of data (see particularly Boeing 2017). The study of the spatial configuration of street elements allows the measure of the urban form and its impacts. For example, Berghauser Pont et al. (2019) study the centrality of the road network and its impact on pedestrian movement in three cities. The studies of street networks predominantly use methods based on graph (Marshall et al. 2018). In the studies of networks, the graph can be a primal graph (for example streets intersections are the nodes of the graph and streets are the edges) or, as it is the case for Space Syntax, it can be a dual graph (streets becomes nodes and streets intersections are edges) (Porta et al. 2006).
In the same line of research than Space Syntax, the present study work with dual connectivity graphs. As inspired by Caruso et al. (2017), Euclidean segments between buildings are computed as the edges of a primal graph. The idea is then to characterise those edges according to their connectivity. A dual connectivity graph is then computed with the Euclidean segments as nodes. The connectivity between the nodes, expressed by the edges, is function of the presence of nodes (buildings) in the primal graph.
Leaving aside the problem of flows (e.g. Andrienko et al. 2010; Hurvitz et al. 2014), switching from physical structures to functions often leads researchers and planners to switch to areal objects, and count population or activities over a given surface (buildings, parcels, grid cells, etc.). An a priori selected reference (basic) spatial unit (BSU) is often used to measure a density index or a more complex metric. A good example is the set of urban metrics proposed by Galster et al. (2001) to outcompete density while measuring sprawl. Each of their eight indices, not just the average density, requires a count of population or land use category over a set of exogenous grid cells, before being further aggregated over an urban region. While structures and arrangements of population and land uses are definitely picked up at the scale of an entire city or neighbourhood, there is still an aggregation process beforehand, and hence information loss, depending on the resolution and placement of the grid or depending on the original recording units (e.g. census tracts). These zonal and scale effects, known as Modifiable Areal Unit Problem (MAUP) (Openshaw 1983), inevitably bias the measures of urban form such as density (Zhang and Kukadia 2005). In order to avoid the biases due to the use of surfaces, we here computed metrics associated to networks, following earlier works of Flahaut et al. (2003) or Okabe et al. (2009).
Overall, the density of buildings is a major indicator for urban planning, but can only properly be used when two conditions are met: (i) density is complemented with indicators describing the relative spatial organisation of buildings and (ii) measurement biases due to the use of basic spatial units (BSU) are overcome.
2.2 Methodological advances
Regarding the first of the two conditions above, progress has been made since Galster et al. (2001) especially towards deriving composite multifactorial indicators. The use of multiple attributes can depict how urban densities are experienced by inhabitants and users (Teller 2021). Caruso et al. (2017) reviewed some of these multiindicator contributions, mostly directed to capturing sprawl. They usually consider urban land pixels, not buildings (e.g. Teller (2021); GodoyShimizu et al. (2021); Berghauser Pont and Haupt (2005); Araldi and Fusco (2019)) and are examples of multifactorial methods applied to building’ distribution in relation to density. However, these indicators are defined from discrete zone (BSU) and therefore do not meet the second condition.
Regarding the second condition, i.e. avoiding the bias of a basic spatial unit (BSU), two strategies have been adopted so far: (i) spatial or statistical smoothing across the BSUs and (ii) clustering individual data with standard (kmeans, etc.) or more complex classification methods (artificial neuronal networks methods, etc.). de Bellefon et al. (2019) is a recent example of smoothing. The authors use a kernel function applied on a very fine grid resolution to avoid losing information about the internal spatial organisation while aggregating. In essence, however, kernels still depend on an a priori defined surface (bandwidth), even if it is often optimised for each case study (similarly to geographically weighted regression approaches with optimum bandwidth (Brunsdon et al. 1996)). Although exceptions exist such as Araldi and Fusco (2019) (among others) in many cases, smoothing methods are applied uniformly across space and may not fit the local composition everywhere. Let us take Fig. 1a as an example of a spatial pattern of buildings. If we compute density using a smoothing grid (kernel) (Fig. 1b), we can see that the topology is lost and that the grid prevents from detecting groups of buildings with a similar topology. Referring to (ii), a recent example of densitybased spatial clustering is the ADBSCAN (Approximate–DensityBased Spatial Clustering of Applications with Noise) by ArribasBel et al. (2019), building on earlier work by Ester et al. (1996). Buildings are grouped according to a density criterion in order to draw city boundaries. Similar to other densitybased methods, ADBSCAN is not parameterfree (minimal number of buildings and maximal distance between them in a cluster), and results can thus greatly vary from one user to another. Furthermore, these two criteria are applied uniformly over the study area, which therefore prevents the detection of locally specific patterns. If we now apply ADBSCAN (Fig. 1c) to our example, we see that buildings with different visual topologies are clustered into one large group (blue). Hence, with this method we can see that the topology is partly lost.
To avoid the vanishing of topologies due to spatial aggregation, Zhang and Kukadia (2005) suggest creating BSUs that make sense with regards to the initial spatial organisation of the disaggregated data. This goal is pursued by densitybased or graphbased clustering methods (Wu et al. 2018; Deng et al. 2011). This is done by Fleischmann et al. (2020) or by Schirmer and Axhausen (2015) who perform clustering, local spatial statistics, and spatial smoothing within the topological constraints of buildingbased tessellation adjacencies or street network topology. In both publications, BSUs are adapted in size and shape to the urban intensity. We pursue the same objective in this paper by presenting a method that is able to provide BSUs that make spatial sense in terms of distribution of buildings (i.e. topology) using a graphbased clustering method.
In graphbased methods, the nodes are typically the buildings (primal graph) and the edges the interbuilding segments computed from their centroid or from their edge. The advantage is to conserve information about the absolute location of buildings as well as their relative location (Anders et al. 2001; Assunção et al. 2006; Caruso et al. 2017; Wu et al. 2018). Most graphbased methods start with a Minimum Spanning Tree (MST), which is easily partitioned into subgraphs, i.e. clusters. Each time one edge is removed, two distinctive subgraphs are created. If n edges are removed from the initial MST, \(n+1\) clusters of buildings appear. Different strategies are available to determine which edges should be removed to perform the spatial clustering. Caruso et al. (2017) remove all edges larger than an a priori threshold fixed at 200 metres. But this unique threshold cannot fit the local spatial organisation of buildings everywhere (Fig. 1d). Zahn (1971); Yu et al. (2014), based on Gestalt theory, remove the edges that are the most different in the set of their contiguous edges according to three parameters determined a priori (\(p_1\) the number of contiguous neighbours; \(p_2\) a ratio between the length of an edge and the average length of its neighbours; \(p_3\) a ratio based on the difference between the length of an edge and the average length of its neighbours to their standard deviation). Results will depend upon these thresholds. While Fig. 1e shows interesting results overall, it can be seen that the pattern of buildings to the northeast of the area is very poorly captured by the method. Assunção et al. (2006) use a Spatial ‘K’luster Analysis by Tree Edge Removal (SKATER) where they iteratively suppress the edge which, once suppressed, minimises the sum of the intracluster variances. In this case, the number of final clusters has to be fixed a priori.
Following the axiomatic of Beguin and Thisse (1979), we know that the denominator of our density metric must preserve the relative positions of buildings everywhere in space. We thus follow graphbased approaches where interbuilding distances are used to build and then to prune the MST. We propose a new segmenting (clustering) algorithm that does not require an exogeneous threshold to be applied uniformly across the area, nor a number of clusters to be defined beforehand. Our strategy, inspired by the LISA approach of Caruso et al. (2017), is to compute a Moran scatterplot of interbuilding distances for each graph (subgraph) and remove the main outlier for segmenting, rather than applying a distance threshold. As a result, the segmentation can be different across the area and catch distinct local topologies. Density is then computed, based on these topologically homogeneous clusters. The reader can already see that other morphometric indices than density could be calculated on these clusters.
3 Materials and methods
3.1 Data input and study area
Our method is applied to all buildings located in Belgium. Data are provided by the Land Registry Administration of Belgium (\(\copyright\) 2018 Administration Générale de la Documentation Patrimoniale). All buildings are used regardless of their function: each house (detached or semidetached), office building, shop, garage, church or factory is kept in the database. In order to avoid the noise generated by very small buildings such as garden sheds, all built polygons smaller than \(12m^2\) were removed from the database as was done by Montero et al. (2021). The database includes 5,726,804 buildings, which further leads us to chunk the data for computation (see “Appendix 1”). Figure 2 shows the study area and a zoomed map of the buildings’ footprint.
3.2 Methods
Our objective is to create a topologybased density index that preserves the local spatial organisation of the buildings. Hence, after preprocessing, our method comprises two main steps: a clustering method (step I) leading to topology consistent groups and a density computation (step II). The process and the outcome of each step are summarised in Fig. 3 for a toy example (Fig. 3a).
3.2.1 Step 0: preprocessing
The input data consists of polygons. Although the size and shape of the polygons can be heterogeneous across space and further impact the distribution of interbuilding distances, we ignore those shapes by retrieving centroids (step 0, see Fig. 3b). The distance between buildings is here the distance between the centroids of the buildings. We could have followed Yu et al. (2014), who measure the actual distance between buildings but this would not be appropriate in our case. Indeed, we want to measure the density of buildings per km of graph. By using distance between centroids, the user of the metric can then say “When I walk 1 km along the graph, I encounter x buildings”. With the actual distance between buildings, this interpretation of the metric would no longer hold. Rather, the user should imagine teleporting him/her self from one end of the buildings to the other as he/she travels along the graph. Moreover, the real distance between two buildings may be zero (e.g. terraced house) which leads to an indeterminacy (denominator is null).
3.2.2 Step I: spatial descending hierarchical clustering (SDHC)
This step starts with a minimum spanning tree (MST) where the nodes are the centroids of the buildings, and the edges are the interbuilding segments. Euclidean distances are used as weights (Fig. 3c) while computing the MST graph, which we denote by G. A descending hierarchical classification (SDHC) is then applied on G to iteratively define subgraphs SG by removing an edge out of each parent graph. The SDHC process is explained below and flowcharted in Fig. 4. It is carried out on graphs of at least 30 vertices in order to avoid statistical problems due to small numbers.
A Moran scatterplot (Anselin 1995) and a maximum Cook’s distance are used to identify the edge that should be removed at each iteration of the SDHC. The Moran scatterplot (Fig. 3d) shows how the edges are spatially associated locally. One point represents one edge of the MST. On the xaxis is the length of an edge, i.e. the distance between two connected buildings, and on the yaxis, its spatial lag, i.e. the weighted average distance of its contiguous edges. We voluntarily restrict the computation of the spatial lag of an edge to its contiguous neighbours of order 1 because we want to detect break between direct neighbours. A topological depth greater than 1 would lead to a smoothing of the spatial lag of each edge. Details about the spatial lag computation can be found in “Appendix 2”.
A linear model (OLS) is then estimated on the scatterplot, the slope of which indicates the global spatial autocorrelation level (Moran’s I, see Anselin (1995)). In most cases, we expect a positive slope, meaning that a long (short) separation between two buildings is found in neighbourhoods where distances between buildings are long (short) on average, i.e. in topologically homogeneous cases. The slope will be insignificant in cases where the relative distance between buildings and thus the topology are more heterogeneous. As pointed out by Anselin (1996), points in the scatterplot that are extreme with respect to the central tendency reflected by the regression slope may be outliers in the sense that they do not follow the same process of spatial dependence as the bulk of the other observations.. We build on this property and use the maximum Cook’s distance to identify the most extreme point (outlier) of each graph G, which is actually the edge to be removed to obtain two subgraphs (SG1 & SG2) (Fig. 3f) featuring two topologically distinct clusters of buildings.
In order to determine whether the removal of the outlier leads to the creation of more homogeneous subgraphs, tests of variance (Brown and Forsythe 1974) between the parent graph (G) and each of the child subgraphs (SG1 & SG2) are performed (Fig. 3e). These tests measure whether the variance of the length of the edges in at least one of the two subgraphs is statistically different from that of the parent graph. If the null hypothesis (equality of variances) is rejected, the edge is removed, and the algorithm is rerun separately on each of the subgraphs. If the null hypothesis is not rejected, the edge is not removed, and the algorithm stops. Technically, in order to perform a relevant variance test, one needs first to make sure the parent graph is not very large (thus bearing a lot of heterogeneity) and that each child graph is at least made of more than one point. Hence, the following three conditions are used to determine whether the observed outlier is removed or not (Fig. 4):

The total length of the MST is higher than 10,000 metres (see “Appendix 3” for details) (1).

One of the subgraphs is made of a single vertex (2).

The variance of the length of the edges in at least one of the two subgraphs is statistically different from that of the graph before edge suppression (3).
Step I is completed when no more edge can be removed (no significant outlier) from any of the subgraphs. Each subgraph is thus a topologically homogeneous cluster of buildings.
3.2.3 Step II: the topologybased density index
Let i be a cluster of buildings resulting from the SDHC above. \(D^*_i\), the topologybased density of the cluster i, is then defined as the ratio of \(N_i\), the number of buildings and \(L_i\) (Fig. 3g) the total length of the edges of the MST of i:
\(D^*_i\) is then a linear density (buildings per linear distance). It would be possible to use the square of \(L_i\) to obtain a surface version of the density. However, this transformation brings new biases (due to the different lengths of \(L_i\)) without bringing any new information.
\(D^*_i\) can then be mapped onto each building of the corresponding graph (see Fig. 3h). Unlike common practice, the denominator is no longer an a priori chosen surface but the length of the shortest line connecting all buildings in a cluster. Contrary to grid based approaches, it is not constant over space in order to match the local spatial structure.
4 Results
4.1 The minimum spanning tree
The minimum spanning tree computed on all buildings located in Belgium describes the global topological structure of the builtup Belgian reality. As expected by the level of urbanisation of the country, the lengths between buildings in the MST are shorter than 50 m for a very large majority (95%) (see Table 1). The histogram (Fig. 5) is rightskewed and has a bimodal distribution with a very strong peak around six metres, and a secondary peak around 20 metres. The left part of the histogram shows short edges which correspond to attached or very close buildings of small size. Note that the very small peak around three metres corresponds mostly to sets of contiguous building extensions. In the database, each extension is a small polygon (although larger than 12 \(\mathrm{m}^2\)). A set of small extensions joined together then leads to the creation of small edges in the MST. The right part corresponds to edges between 15 and 50 m characterizing more detached buildings. Edges longer than 50 m (not illustrated in the histogram) are typical of more isolated buildings. The presence of a large peak at short distances combined with a second peak at medium distances and the absence of a peak at longer distances fully reflects Belgian urbanisation. This urbanisation is indeed characterised by numerous centres (towns, villages) connected with a strong level of suburbanisation and sprawl (Vanneste et al. 2008; Vandermotten et al. 2008; Vanderstraeten and Van Hecke 2019). One would expect a much smaller second peak and a rural peak (long distance) for a less continuous urbanisation, as is the case for example in the Netherlands.
4.2 Moran scatterplots and their outliers
Out of the 260,359 Moran scatterplot regressions performed during the whole process, 93% show a positive and significant slope (Moran’s I). This demonstrates a high degree of homogeneity in the spatial distribution of buildings within the Belgian landscape. Indeed, at each iteration of the method, and therefore at all scales, there would be very few abrupt discontinuities in the spatial distribution of the buildings. This confirms the observations made earlier. Furthermore, the high level of significance of the OLS confirms the relevance of using Moran scatterplot to identify outliers.
We observe two types of outliers within the scatterplots: first, those corresponding to an edge surrounded by edges longer than expected by the global spatial autocorrelation (Fig. 6a), and second, those corresponding to an edge surrounded by edges much smaller than expected (Fig. 6b). An outlier separates two distinct topological forms (Fig. 7a) but in some cases it can simply isolate some remote buildings from the rest (Fig. 7b). An outlier therefore separates settlements, towns, city districts, villages, etc., or separates isolated and heterogeneous housing from a homogeneous structure of buildings (a farm on the outskirts of a village, a church in a city centre, etc.).
75% of the removed edges have a length between 30 and 80 m, while the median length takes the value of 44 metres (Table 2). The large observed range of removed edges shows that the iterative process implemented in the method allows the identification of clusters of different patterns at different scales and for different realities. Indeed, from the first removed edge up to the last one, the iterative process progressively splits the initial graph (all of Belgium) into a series of smaller graphs that outline nested clusters, each with its specific characteristics. The use of the Moran scatterplot allows the selection of the edge to be deleted taking into account these specificities. The method identifies the removed edge at each step. It is therefore possible to go back up the clustering tree to observe these different clusters at different scales. This would not have been possible when using a method based on an a priori defined threshold (Zahn 1971; Yu et al. 2014; Caruso et al. 2017). Moreover, the median value (43 m.) shows that the discontinuity in buildings is in the majority of cases much lower than the one generally used by those studies (between 100 and 200 m).
4.3 Clusters of buildings
At the end of step I, the method discriminates 26,462 subgraphs (see Fig. 19 in “Appendix 4”). Over \(95\%\) of the subgraphs have a coefficient of variation of the length of the edges smaller than 1, which means that within a subgraph, the lengths of the edges are homogeneous (Fig. 8). Each subgraph can thus be considered as a topologically homogeneous cluster of buildings.
Within each cluster, the variance of the interbuilding distances is small, which results in the detection of builtup footprints characterised by a regular pattern of buildings (homogeneous topology). Let us now consider a first example illustrated in Fig. 9. It includes two regular neighbourhoods (A and B). A is a compact village with a radial morphological structure; B is made up of a regular alignment of buildings that forms a linear ribbon development. A second example is reported in Fig. 10 already used in Sect. 2.2, south of Brussels, composed of eight homogeneous neighbourhoods (A:H). Each neighbourhood corresponds to a particular pattern of buildings, with a historical centre around the church (A), classical planned housing estates (B:F), and two more linear developments (G:H). Isolated buildings or heterogeneous groups of less than 30 buildings are left out (mainly isolated farms typical of the area).
Our 26,462 clusters can now be considered to be topologically relevant Basic Spatial Units (BSU). Since each BSU are internally homogeneous in terms of distance between centroids, we can confidently use those units to compute index such as density, which characterises the spatial distribution of buildings centroids within each cluster.
4.4 Density of buildings
The topologybased density is now computed for each cluster. Each group of buildings with a homogeneous topology has a specific density value. To explain this specificity of our method, we have compared our results with those obtained by a simple gridbased density smoothed by a kernel function with two examples (Figs. 11, 12).
With our method, only one density value is computed by cluster when several values are needed with a grid. For example, in the case of Seneffe (Fig. 11), we identify seven clusters, with seven density values ranging from 19 to 86 buildings per km (Fig. 11b). The smoothed grid approach covers the area and compute density ranging from zero to 31 buildings per hectare (Fig. 11c). While our method detects seven homogeneous buildings patterns with precise contours, the grid method suggests two or three main centres surrounded by less dense periphery located in the east and some shadows in the west. Similarly in the case of Genappe (Fig. 12), our method detects three distinct homogeneous buildings patterns (53, 58 and 100 b/km), while the grid method delivers different density values ranging from zero to 57 b/ha) and showing a large centre in the west surrounded by a periphery that develops in a ribbon towards the east.
In each example, the different density values, associated with the different clusters, allow the identification of particular urban structures. In the case of Seneffe (Fig. 11b), the two clusters with a density of 73 b/km and 74 b/km include the buildings of the centre, consisting mainly of semidetached buildings along a main axis from north to south. In the periphery of the centre, the cluster with a density of 60 b/km is formed of detached buildings while the cluster of 86 b/km includes much more attached buildings (public housing). The clusters around 30–50 b/km are associated with housing estates wellseparated from the centre with exclusively detached buildings. Large interbuilding distances characterise the cluster with the lowest density (19 b/km) (industrial zone). In the case of Genappe (Fig. 12b), the cluster with a density of 100 b/km includes the buildings of the centre. In an extension of the centre, two clusters are identified with a density about 50–60 b/km consisting of wellseparated buildings. One of these clusters is a ribbons extension along a road from the centre (53 b/km), and the other is a more widely spread cluster assimilated to a district in the periphery of the centre (58 b/km).
Given the definition of the topologybased density (the number of buildings divided by the length of the MST), there is a direct relation between the density value and the average length within a cluster (\(\overline{l_i}\)). In fact, inverting Eq. 1,
and because in a MST the number of edges is always equal to the number of points (\(N_i\)) minus 1,
Then, \(D^*_i\) becomes:
with \(\beta \approx 1\) when \(N_i\) is large (\(N_i \approx (N_i  1)\)). For all 26,462 clusters obtained in Belgium, the value of \(\beta\) can be estimated (OLS after logging both sides of Eq. 4). We obtain a value of \(\beta\) equal to \(0.996\). This is indeed very close to 1 but shows a slight under(over) estimation for graphs of longer (smaller) average length. In the words of Beguin and Thisse (1979), we show that density (\(D^*_i\)) (of buildings in this case) cannot be separated from the metric of the relative location of places (\(\overline{l_i}\)) and that this relationship follows a simple power law.
According to Eq. 4, the topologybased density only depends on the interbuilding distances unlike a surfacebased approach where the density can vary according to the area of the BSU without considering the distance between buildings. This might sound like a trivial result, but the use of surfacebased densities cannot differentiate between two BSUs where the same number of buildings are located but where once is concentrated and once is dispersed. While others researchers would add additional metrics to capture this (e.g. Galster et al. 2001; Berghauser Pont and Haupt 2005), our density measure suffices.
Practically, if we know the value of the topologybased density, we can work out the relative spatial organisation of the buildings. Figure 13 illustrates four such cases. (a) A built density value higher than 100 b/km is computed on adjoining buildings or very close to each other (mean distances of less than 10 metres) as is the case in city centres (Fig. 13a). (b) A value between 50 and 100 b/km is related to a topology of buildings relatively close to each other (mean distances between 10 and 20 m) as may be the case, for example, in the periphery of cities or in smaller centres (Fig. 13b). (c) Relatively wellseparated buildings such as in a periurban housing estate (mean distances between 20 and 50 m) have densities in a range between 20 and 50 b/km (Fig. 13c). Last but not least, (d) densities lower than 20 b/km reflect clusters of buildings with average distances greater than 50 m (Fig. 13d).
At the scale of the entire country, the newly computed topologybased density (see Fig. 20 in “Appendix 5”) shows a spatial structure that expresses urbanisation in Belgium (see Sect. 4.1).
5 Discussion
5.1 Thematic contribution
We have proposed a spatial descending hierarchical clustering method that delineates clusters of buildings with homogeneous interbuilding distances. Based on these clusters, we compute a topologybased density index where the denominator preserve the relative positions of buildings. We show (Sect. 4.4) that the index eventually only depends on the average distance between buildings in each cluster. It is a strong advantage compared to standard surfacebased densities where density depends on the delineation and definition of a reference area (BSU).
The numerator considered here is simply the number of buildings. Depending on the final objective of the measure, other numerators could equally be used such as the surface of buildings, their volume or height (see e.g. Yu et al. 2010). In our case, we could imagine using the surface area of the buildings in each cluster or the total surface area of their floors as the numerator. These indices would remain topologybased as the denominator does not change. The diversity of indices is therefore a function of the diversity of possible numerators. As shown by Wu et al. (2018), a large number of characteristics (distance, orientation, height, size, etc.) of buildings can easily be integrated into a graph. It is up to the planners to develop and use them according to their needs. In the same way, other parameters can be used as a basis for the SDHC. The association of a Cooks distance with a Moran Scatterplot can be used to distinguish the most different object in a cluster. Rather than having clusters with identical patterns in terms of distance between buildings, some could, for example, look at groups with similar building heights.
The only data input used here are the buildings. This can appear counterproductive since streets, squares, parks, and gardens are traditionally identified as important places in builtup realities (Gehl 1987). However, as expressed in Sect. 2, it is possible to link a large number of urban issues to the structure/proximity of buildings (energy consumption (Rinkinen et al. 2021), mental health (Sullivan and Chang 2011), population estimates (Tomás et al. 2016), etc. This is why we believe that focusing on the density of buildings can be relevant for urban space issues.
We are aware that the use of our graphbased index may appear difficult for urban policy makers as they are often used to work on a externally determined surface basis. However, we believe that it is sometimes necessary to change the approach because of biases and errors in measurement and interpretation induced by these surfaces. Moreover, measuring the density along a topological network (as proposed here) has also two practical advantages compared to a more classical surface approach. First, it is a more operational way to study the relationships/interactions between buildings and linear infrastructures. Linear infrastructures (electricity, gas, water) do not always follow roads and their planning could also benefit from our measure. Our method enables to determine which buildings are spatially connected and at which specific distance. Second, by using a distanceweighted graph, we find that our measure is more likely to lead to the study of the relationships and interactions that can exist between points (buildings). Indeed, the network approach allows us to distinguish points that are connected to each other while measuring the characteristics of each group. A parallel can be drawn with ecological research where graphs are used. For example, in the same way that Foltête and Vuidel (2017) delineates functional ecological zones with by means of landscape graphs, we should be able to better measure and therefore better understand the relationships between people living in different spaces of a city, or to understand how these different spaces are organized.
Our topologybased index is a contribution to increase the quality of the measurement and understanding of the morphology of the built space. We know that the topology of the buildings is only one aspect of the complexity of such space. Taking a multifactorial approach, it would certainly be possible to develop and combine other indices with the topologybased density index as presented here. We already mentioned the addition of the third dimension (height of the buildings). But it is certainly also possible to adapt other indicators. The concentration index developed by Galster et al. (2001) could for example be adapted to measure whether or not, within a cluster, the buildings are rather aligned or form a block. Complementing the topologybased density with other indices could be the next step in this research.
5.2 Methodological limits
A first methodological limit is the use of the centroid of each building instead of the building footprint in the creation of the graph. We have seen that in our case, it is the distance between building centroids that must be considered in order to obtain a calculable and more easily interpretable measure. We note, however, that in some cases the interpretation of the measurement may lead to a poorer perception of the built environment. For example, a given spatial organization of centroids may reflect the location of small buildings that are far apart (i.e. isolated farms) or very large buildings that almost join (i.e. industrial zoning).
The use of a Cook’s distance for identifying the outlier in the Moran scatterplot can also be discussed. There are many alternative graphical (scatterplot, boxplot, etc.) or analytical (standardised residuals, hat matrix, etc.) methods of detection of outliers in regression (Ampanthong and Suwattee 2009). Analytical methods have the advantage that they do not require human visual interpretation. Cook’s distance is pointed out by Ampanthong and Suwattee (2009) as one of the best indices for the detection of outliers in multiple regression. In our paper, we find it interesting because it combines both residual information (is a point far from the line?) and information on the influence of each point in the regression. It should be noted that the method identifies a single outlier (the observation for which the Cook’s distance is maximum). If the outlier is clearly identifiable, the different indices will converge. If several outliers are present and if they are not clearly identifiable, results might not converge. However, we did not encounter this figure in our empirical analyses, but are aware that it could happen. Another limitation that should be investigated in the future with the help of statisticians concerns the use of outliers in a regression whose slope is not significant. This happens very little in our case (5%) but could happen in a more important way if someone wants to work on a less homogeneous variable than the distance between centroid buildings.
Another methodological limit concerns the use of the 10,000 metres threshold as a constraint on the removal of an outlier. On the one hand, the objective of the method is not impacted by the value of the threshold. Whatever the threshold, the method always creates clusters whose topologies tend to be more and more homogeneous during the iterations. On the other hand, the threshold can modify the scale at which the method will stop. A high value will lead to the creation of large clusters (large length of graph), whereas a small value leads to the creation of very fine scale clusters. The threshold of 10,000 seems to be the most relevant for density measurements in Belgium but remains debatable.
Last but not least, the choice of variance test can also be a source of discussion. Indeed, it is important a priori to control the distribution of the populations tested when carrying out a test of variance (Box 1953). We do not carry out this control systematically. However, we have noticed in a large majority of cases that the distribution of the length of the edges in the (sub)graphs was heavytailed (as shown in Fig. 5). Therefore, we opted for the Brown and Forsythe variance test. This test is the most appropriate for this type of distribution (it does not consider the most extreme 10% of the distribution) (Brown and Forsythe 1974). In comparison with other tests, the Brown and Forsythe test gave the most visually appropriate results. One way to further improve the method may be to systematically assess the shape of the distributions to be tested and the application of the most appropriate test to each case.
6 Conclusion
Following Caruso et al. (2017) who sought to identify urban form patterns using methods based on graphs, and following (e.g. Berghauser Pont and Haupt (2005)) who sought to measure buildings’ density with distinct indices depending on spatial units, we develop here a method to obtain a built density index that preserves the topology of buildings. This means that we can now identify clusters of buildings with homogeneous interbuilding distances, and we can further measure, for each cluster, the density of buildings while preserving information about their relative positions.
Our method works in two steps. After retrieving the centroids of the buildings, the first step in the method consists in a spatial descending hierarchical clustering (iterative approach). Based on a minimum spanning tree weighted by interbuilding distances, a Moran scatterplot combined with a maximum Cook’s distance are used to identify the edge of the MST that diverges most from its neighbours (outlier). This edge is removed if it meets several criteria; one of these is the inequality of the variances of the lengths of the edges with and without outlier. At the end of step I, clusters of buildings with homogeneous interbuilding distances are delineated. In the second step, the topologybased density is computed by dividing the number of buildings in a cluster by the total length of the MST connecting all buildings in that cluster.
The method is applied to all buildings located in Belgium. Clusters with homogeneous interbuilding distances are clearly identified. For example, some clusters refer to the organization in a compact village, others to a linear development along a road, or to a housing estate organization, etc. For each cluster, the value of the newly developed density index reflects the topology, i.e. the relative position of buildings. A high (low) density will be measured when the distance between buildings is small (large). The topologybased density index is then only influenced by the relative position of the buildings (average interbuilding distance). This is not the case for standard density measures, using an a priori fixed surface. Topologybased density is therefore a quite useful index for measuring and understanding builtup patterns.
Data and codes availability statement
The data and codes (python) that support the findings of this study are available on Montero, Gaëtan; Caruso, Geoffrey; Hilal, Mohamed; Thomas, Isabelle, 2022, "Replication Data for: A partition free spatial clustering that preserves topology: application to builtup density", https://doi.org/10.14428/DVN/IX0JRW, Open Data @ UCLouvain, V1.
References
Ampanthong P, Suwattee P (2009) A comparative study of outlier detection procedures in multiple linear regression. In: Proceedings of the international multiconference of engineers and computer scientists 7
Anders KH, Sester M, Fritsch D (2001) Analysis of settlement structures by graphbased clustering. 10
Andrienko G, Andrienko N, Demsar U, Dransch D, Dykes J, Fabrikant SI, Tominski C (2010) Space, time and visual analytics. Int J Geogr Inf Sci 24(10):1577–1600. https://doi.org/10.1080/13658816.2010.508043
Angel S, LamsonHall P, Blanco ZG (2021) Anatomy of density: measurable factors that constitute urban density. Build Cities 2(1):264–282. https://doi.org/10.5334/bc.91
Anselin L (1995) Local indicators of spatial associationLISA. Geogr Anal 27(2):93–115
Anselin L (1996) The Moran scatterplot as an ESDA tool to assess local instability in spatial association. In: Spatial analytical perspectives on GIS spatial analytical perspectives on GIS, vol 4
Araldi A, Fusco G (2019) From the street to the metropolitan region: pedestrian perspective in urban fabric analysis. Environ Plan B Urban Anal City Sci 46(7):1243–1263
ArribasBel D, GarciaLópez MA, ViladecansMarsal E (2019) Building(s and) cities: delineating urban areas with a machine learning algorithm. J Urban Econ. https://doi.org/10.1016/j.jue.2019.103217
Assunção RM, Neves MC, Câmara G, Da Costa Freitas C (2006) Efficient regionalization techniques for socioeconomic geographical units using minimum spanning trees. Int J Geogr Inf Sci 20(7):797–811. https://doi.org/10.1080/13658810600665111
Batty M (2013) The new science of cities. MIT Press
Beguin H, Thisse JF (1979) An axiomatic approach to geographical space. Geogr Anal 11(4):325–341. https://doi.org/10.1111/j.15384632.1979.tb00700.x
Berghauser Pont M (2021) Measuring Urban Form. In: 28th international seminar on urban form—ISUF 2021: Urban form for sustainable and prosperus cities, Glasgow
Berghauser Pont M, Haupt P (2005) The spacemate: density and the typomorphology of the urban fabric. Nordisk Arkitekturforskning 18(4):55–68
Berghauser Pont M, Haupt P, Berg P, Alstäde V, Heyman A (2021) Systematic review and comparison of densification effects and planning motivations. Build Cities 2(1):378–401. https://doi.org/10.5334/bc.125
Berghauser Pont M, Stavroulaki G, Marcus L (2019) Development of urban types based on network centrality, built density and their impact on pedestrian movement. Environ Plan B Urban Anal City Sci 46(8):1549–1564. https://doi.org/10.1177/2399808319852632
Bobkova E, Marcus L, Pont B (2017) Multivariable measures of plot systems: describing the potential link between urban diversity and spatial form based on the spatial capacity concept. In: Proceedings of the 11th space syntax symposium, Lisbon, Portugal 16
Boeing G (2017) OSMnx: new methods for acquiring, constructing, analyzing, and visualizing complex street networks. Comput Environ Urban Syst 65:126–139. https://doi.org/10.1016/j.compenvurbsys.2017.05.004
Box G (1953) Nonnormality and tests on variances. Biometrika 40(3–4):318–335. https://doi.org/10.1093/biomet/40.34.318
Boyko CT, Cooper R (2011) Clarifying and reconceptualising density. Prog Plan 76(1):1–61. https://doi.org/10.1016/j.progress.2011.07.001
Brown MB, Forsythe AB (1974) Robust tests for the equality of variances. J Am Stat Assoc 69(346):364–367. https://doi.org/10.1080/01621459.1974.10482955
Brunsdon C, Fotheringham S, Charlton ME (1996) Geographically weighted regression: a method for exploring spatial nonstationarity. Geograph Anal. https://doi.org/10.1111/j.15384632.1996.tb00936.x
Caruso G, Hilal M, Thomas I (2017) Measuring urban forms from interbuilding distances: combining MST graphs with a Local Index of Spatial Association. Landsc Urban Plan 163:80–89. https://doi.org/10.1016/j.landurbplan.2017.03.003
Churchman A (1999) Disentangling the concept of density. J Plan Lit 13(4):389–411
de Bellefon MP, Combes PP, Duranton G, Gobillon L, Gorin C (2019) Delineating urban areas using building density. JUrban Econ. https://doi.org/10.1016/j.jue.2019.103226
Deng M, Liu Q, Cheng T, Shi Y (2011) An adaptive spatial clustering algorithm based on delaunay triangulation. Comput Environ Urban Syst 35(4):320–332. https://doi.org/10.1016/j.compenvurbsys.2011.02.003
Ester M, Kriegel HP, Xu X (1996) A densitybased algorithm for discovering clusters in large spatial databases with noise. In: vol 240, p 6
Ewing R, Hamidi S, Tian G, Proffitt D, Tonin S, Fregolent L (2018) Testing Newman and Kenworthy’s theory of density and automobile dependence. J Plan Educ Res 38(2):167–182. https://doi.org/10.1177/0739456X16688767
Flahaut B, Mouchart M, Martin ES, Thomas I (2003) The local spatial autocorrelation and the kernel method for identifying black zones A comparative approach. Accid Anal Prev 35:991–1004
Fleischmann M (2019) momepy: urban Morphology Measuring Toolkit. J Open Sour Softw 4(43):1807. https://doi.org/10.21105/joss.01807
Fleischmann M (2021) The urban atlas: methodological foundation of a morphometric taxonomy of urban form. Urban Design Studies Unit Department of Architecture University of Strathclyde, UK, p 511
Fleischmann M, Feliciotti A, Romice O, Porta S (2020) Morphological tessellation as a way of partitioning space: improving consistency in urban morphology at the plot scale. Comput Environ Urban Syst 80:101441
Fleischmann M, Feliciotti A, Romice O, Porta S (2021) Methodological foundation of a numerical taxonomy of urban form. Environ Plan B Urban Anal City Sci 00:1–17. https://doi.org/10.1177/23998083211059835
Fleischmann M, Romice O, Porta S (2021) Measuring urban form: overcoming terminological inconsistencies for a quantitative and comprehensive morphologic analysis of cities. Environ Plan B Urban Anal City Sci 48(8):2133–2150. https://doi.org/10.1177/2399808320910444
Foltête JC, Vuidel G (2017) Using landscape graphs to delineate ecologically functional areas. Landscape Ecol 32(2):249–263
Galster G, Hanson R, Ratcliffe MR, Wolman H, Coleman S, Freihage J (2001) Wrestling sprawl to the ground: defining and measuring an elusive concept. Hous Policy Debate 12(4):681–717. https://doi.org/10.1080/10511482.2001.9521426
Gehl J (1987) Life between buildings: using public space Life between buildings: using public space (Island press ed.)
GodoyShimizu D, Steadman P, Evans S (2021) Density and morphology: from the building scale to the city scale. Build Cities 2(1):92–113. https://doi.org/10.5334/bc.83
Hillier B (1996) Space is the machine
Hurvitz PM, Moudon AV, Kang B, Saelens BE, Duncan GE (2014) Emerging technologies for assessing physical activity behaviors in space and time. Front Public Health. https://doi.org/10.3389/fpubh.2014.00002
Longley PA, Mesev V (2002) Measurement of density gradients and spacefilling in urban systems*. Pap Reg Sci 81(1):1–28. https://doi.org/10.1111/j.14355597.2002.tb01219.x
Lynch K (1984) Good city form (MIT press ed.)
Marshall S, Gil J, Kropf K, Tomko M, Figueiredo L (2018) Street network studies: from networks to models and their representations. Netw Spat Econ 18:735–749
Martino N, Girling C, Lu Y (2021) Urban form and livability: socioeconomic and built environment indicators. Build Cities 2(1):220–243. https://doi.org/10.5334/bc.82
Montero G, Tannier C, Thomas I (2021) Delineation of cities based on scaling properties of urban patterns: a comparison of three methods. Int J Geogr Inf Sci 35(5):919–947. https://doi.org/10.1080/13658816.2020.1817462
Moudon AV (1997) Urban morphology as an emerging interdisciplinary field. Urabn Morphol 1(1):3–10
Okabe A, Satoh T, Sugihara K (2009) A Kernel density estimation method for networks, its computational method and a GISbased tool. Int J Geogr Inf Sci 23:7–32
Openshaw S (1983) The modifiable areal unit problem. Concepts Techn Mod Geogr 38:38
Pauleit S, Duhme F (2000) Assessing the environmental performance of land cover types for urban planning. Landsc Urban Plan 52(1):1–20. https://doi.org/10.1016/S01692046(00)001092
Porta S, Crucitti P, Latora V (2006) The network analysis of urban streets: a dual approach. Physica A 369(2):853–866
Rinkinen J, Shove E, Smits M (2021) Conceptualising urban density, energy demand and social practice. Build Cities 2(1):79–91. https://doi.org/10.5334/bc.72
Schirmer PM, Axhausen KW (2015) A multiscale classification of urban morphology. J Transp Land Use 9(1):101–130. https://doi.org/10.5198/jtlu.2015.667
Sullivan WC, Chang CY (2011) Mental health and the built environment. In: Dannenberg AL, Frumkin H, Jackson RJ (eds). Making healthy places making healthy places. Washington, DCIsland Press/Center for Resource Economics, pp 106–116. https://doi.org/10.5822/9781610910361_7
Sémécurbe F, Tannier C, Roux SG (2019) Applying two fractal methods to characterise the local and global deviations from scale invariance of built patterns throughout mainland France. J Geogr Syst 21(2):271–293. https://doi.org/10.1007/s1010901802861
Teller J (2021) Regulating urban densification: What factors should be used? Build Cities 2(1):302–317. https://doi.org/10.5334/bc.123
Tomás L, Fonseca L, Almeida C, Leonardi F, Pereira M (2016) Urban population estimation based on residential buildings volume using IKONOS2 images and lidar data. Int J Remote Sens 37(sup1):1–28. https://doi.org/10.1080/01431161.2015.1121301
Vandermotten C, Halbert L, Roelandts M, Cornut P (2008) European planning and the polycentric consensus: Wishful thinking? Reg Stud 42(8):1205–1217. https://doi.org/10.1080/00343400701874206
Vanderstraeten L, Van Hecke E (2019) Les régions urbaines en Belgique. Belgeo 1. https://doi.org/10.4000/belgeo.32246
Vanneste D, Thomas I, Vanderstraeten L (2008) The spatial structure(s) of the Belgian housing stock. J Housing Built Environ 23(3):173–198. https://doi.org/10.1007/s1090100891113
Williams K, Burton E, Jenks M (2000) Achieving sustainable urban form: an introduction. Achiev Sustain Urban Form 2000:1–5
Wu B, Yu B, Wu Q, Chen Z, Yao S, Huang Y, Wu J (2018) An extended minimum spanning tree method for characterizing local urban patterns. Int J Geogr Inf Sci 32(3):450–475. https://doi.org/10.1080/13658816.2017.1384830
Yu B, Liu H, Wu J, Hu Y, Zhang L (2010) Automated derivation of urban building density information using airborne LiDAR data and objectbased method. Landsc Urban Plan 98(3–4):210–219. https://doi.org/10.1016/j.landurbplan.2010.08.004
Yu B, Shu S, Liu H, Song W, Wu J, Wang L, Chen Z (2014) Objectbased spatial cluster analysis of urban landscape pattern using nighttime light satellite images: a case study of China. Int J Geogr Inf Sci 28(11):2328–2355. https://doi.org/10.1080/13658816.2014.922186
Zahn C (1971) Graphtheoretical methods for detecting and describing gestalt clusters. IEEE Trans Comput C–20(1):68–86. https://doi.org/10.1109/TC.1971.223083
Zhang M, Kukadia N (2005) Metrics of urban form and the modifiable areal unit problem. Transp Res Record J Transp Res Board 1902:71–79
Acknowledgements
Geoffrey Caruso acknowledges support from the Luxembourg National Research Fund via the URBANFORMS project (INTER/MOBILITY/mobility/2020/14519030)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix 1: Chunking of the database
To limit computation time, the database is divided into different chunks spatially adjacent to each other. However, this chunking produces border effects that might bias the results. Indeed, two buildings belonging potentially to the same morphological structure can be separated if they belong to two different chunks. To avoid these border effects, a second chunking process is applied to the same dataset (Fig. 14). The method runs for each chunk separately. The results of the chunks from the chunking 1 are then combined but the subgraphs close (1,000 metres) to the chunks border are replaced by subgraphs from chunking 2 to give the buildings’ density of Belgium (Sect. 4).
The final results of the method are not dependent on the initial chunking. Indeed, if each building is assigned the density value of the subgraph to which it belongs, then more than 92% of the buildings have exactly the same density with chunking 1 and chunking 2. If buildings within 1,000 metres of the chunking boundaries are not taken into account (boundary effect), the percentage of identical density between the two chunkings increases even more to 97%.
Appendix 2: Calculation of the spatial lag
The spatial lag is the product of the row standardised matrix of adjacency with the standardised vector of distance between the centroids of the buildings. The adjacency matrix (diagonal equal to zero) is obtained by the product of the incidence matrix with its transpose (Fig. 15).
Appendix 3: Sensitivity to condition (1)
Condition (1) expresses the fact that if the total length of the graph is higher than 10,000 metres, the outlier is removed. With this condition, we ensure that the method detects finescale morphological features. We assume that a graph longer than 10,000 m is too heterogeneous with regard to its interbuilding distances. It is therefore necessary to remove the outlier detected in the Moran scatterplot to allow the method to search for more homogeneous topological clusters as it does in condition (3). Note that the 10,000 m threshold was not selected at random: several values were tested (see examples in Figs. 16, 17 and 18). With values larger than 10,000, the algorithm cannot distinguish different topological patterns, especially in the countryside. Conversely, with values lower than 10,000 the algorithm detects small local topological differences that are not relevant in terms of density measure.
Appendix 4: The 26,462 clusters for Belgium according to the SDHC
See Fig. 19.
Appendix 5: The topologybased built density index computed for Belgium
See Fig. 20.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author selfarchiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Montero, G., Caruso, G., Hilal, M. et al. A partitionfree spatial clustering that preserves topology: application to builtup density. J Geogr Syst 25, 5–35 (2023). https://doi.org/10.1007/s10109022003964
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10109022003964
Keywords
 Density
 Topology
 Graph
 Moran scatterplot
 Buildings
JEL Classification
 R00
 R14
 C49
 O18
 O21