Wider urban zones: use of topology and nighttime satellite images for delimiting urban areas

In the literature on the deﬁnition of urban areas, the methodological approaches are divided into formalist (aggregation by density thresholds) and functionalist (aggregation by commuting quotas). This paper proposes a mixed approach, in which the territorial density threshold from the lower-level administrative unit is combined with the brightness of nighttime satellite imagery, intended as a proxy variable for the functional links. The objective is to attain a method for the delimitation of urban areas, to be used by various States and Regions across the world in an iterative procedure, for the delimitation of urban areas as connected topological spaces. This represents an independent method, compared to the various standards adopted by national and regional statistics bureaus, which allows comparing the infrastructural, economic, and social data of different cities in the world. Such cities are hence described in terms of the “real” dimension of the urban areas, partially correcting the bias related to the adoption of administrative perimeters as a “fact” when local authorities make decisions regarding them.

Agglomerations and conglomerations have been intimately related to the progress of urbanization and the development of urban public transport in the late nineteenth century. The term, which appeared at the end of the nineteenth century, did not raise great interest until the early decades of the twentieth century. Urban morphology became a field of interest following the diffusion of private cars and the functionalists' new urban planning approaches. This concept was systemically defined in the first half of the 20th century. For example, in Switzerland, in 1930, the Federal Statistical Office (Bundesamt für Raumentwicklung ARE 2006 defined an "urban agglomeration" as a seamlessly built area, consisting of a center and an urban belt, where at least 20% of the population had to work in the primary sector and at least 30% of the active population had to be commuters. In the USA (Adams et al. 1999) and Canada (Statistics Canada 2011), the aggregation of administrative units into Metropolitan Statistical Areas is performed only according to the demographic size, without taking into account functional parameters. This approach originated in the United Kingdom (UK Official Statistics 2011), in an attempt at finding a measure of the real size of urban areas. However, it did not lead to an evolution in the administrative division. Only in France (2015), a (simplified) statistical model has been the basis for a revision of the administrative division. The French model is only based on additive heuristics, without considering commuting flows. Similar approaches also followed this trend in Germany (DESTATIS Statistiches Budensamt 2019) and Japan (Institute for Urban Strategies The Mori Memorial Foundation 2019); however, these analyses are limited to the main urban areas (over 500,000 inhabitants).
The purpose of discretizing urban and non-urban space has already been discussed in the context of the modifiable areal unit problem (MAUP, (Gehlke and Biehl 1934)). Openshaw (Stan 1983) observed that "the areal units (zonal objects) used in many geographical studies are arbitrary, modifiable, and subject to the whims and fancies of whoever is doing, or did, the aggregating". The issue is still relevant for regional political economy, with special regard for the functional delineation of the urban segments. The use of small-scale spatial data plays a major role in that research field (Rüdiger and Neumann 2019) both for socio-economic (Wong 2009) and transport analyses (Viegas et al. 2009).
The basic algorithm for economic analyses has been devised in the '80s for the analysis of Travel-to-Work Areas in Great Britain (Coombes et al. 1986). It is a classical constructive heuristic, while the contiguity constraint of the districts is estimated a posteriori. This approach is used also in Italy (ISTAT Istituto Nazionale di Statistica 2017) with commuting geography.
The issue of the definition of metropolitan areas is taken up by Tonev et al. (Tonev et al. 2017), who investigate the methodological approaches related to the functional integration of regional units and inter-regional relations. They focus on those used within Integrated Territorial Investments (ITI) to delimit the metropolitan areas analyzing the functional integration of regional units and inter-regional relations. ITI ("Community Led Local Development", (European Urban Knowledge Network 2018)) has been introduced by the EU, encouraging member states to define and study their metropolitan areas. Galdo et al. (2019) propose a method to identify urban areas by combining data from the ground (Census) and space (Google Earth). Bosker et al. (2020) focus on Indonesia and make use of data on commuting flows, spatially fine-grained population, and remotely sensed nighttime lights to define metropolitan areas using several approaches from the literature. Moreno-Monroy et al. (2020) present a method to delimit metropolitan areas-as functional urban areas (FUAs)-by analyzing the effects of the thresholds (density, commuting share). Maravalle and Simeone (1995) considered the problem of subdividing a region into districts-formed by geographically contiguous sites-with the highest possible homogeneity concerning a certain set of characteristics. Hansen et al. (2003) provided an exact algorithm for regional clustering. On the same path, Duque et al. (2011) introduced a new spatially constrained clustering problem called the max-p-regions problem. It involves clustering a set of geographic areas into the highest possible number of homogeneous regions so that the values of spatially extensive regional attributes are above a set threshold.
A standard definition of the metropolitan area was developed by OECD (2012), (Veneri 2016) and Eurostat (Schiavina et al. 2019;Eurostat 2019a): it is based on the aggregation of municipalities around a pivot city following functional thresholds (e.g., a minimum share of daily trips toward the pivot).

From static satellite data acquisition to AI procedures
As in the rest of geographic sciences, the availability of satellite data has also been discontinuous across human geography. Kim (1890) proposed the first study on US metropolitan areas, using both cartographic and satellite data. Uchida et al. (Uchida and Nelson 2011), in absence of O-D commuting flow data, introduced the Agglomeration Index (AI) to measure the "thickness" of urban concentration. Duranton (2015) adopted a functional approach, by proposing a commuting algorithm. Following a series of complex procedures (Bankert et al. 2011) to clean the white noise of natural and anthropogenic-non-urban-light sources, nighttime satellite images can be the main tool for the analysis of cities (Chen et al. 2017). Smith (2016) investigated applications and techniques for socio-economic research with online interactive thematic mapping using post-processed satellite images. Freire et al. (2016) investigated the relationship between territorial density and size of the spatial sampling grid used for satellite images. Bennet & Smith (2017)'s reviews develop the use of multitemporal DMSP-OLS and VIIRS imagery to analyze urbanization, economic, and population dynamics across a range of geographic scales. The second-generation NASA "Black Marble" images (Román et al. 2018) offer a greater definition and are also provided as a georeferenced raster for GIS analysis. They allow multiple outputs, ranging from verisimilitude (Roychowdhury et al. 2011) to machine learning-aided linear discretization techniques (Rybnikova et al. 2021).

Why adopt a simplified approach?
The definition of the functional space of cities is still a trending topic in regional political economy, especially in the framework of a functional delimitation of urban segments. The use of small-scale spatial data plays a key role in this research area (Budde and Neumann 2019). In addition to the socio-economic studies based on the definition of territorial grids, several studies use sensor-generated data, such as daytime and nighttime satellite images, for the segmentation and classification of urban and non-urban space (Skyglowberlin 2022). Elvidge et al. (2021) have carried out an in-depth investigation of the opportunities provided by the satellite dataset for the detailed delimitation of urban areas, reducing signal interferences in (Zhou et al. 2015), they provide a high-detail global mapping. However, the more detailed satellite images are, the lower the compatibility with local datasets is, as these are based on local administrative divisions. The subdivision of these data, which is virtually possible, would make multi-national comparisons much more complex. Different needs must be balanced to achieve the set goal: that is, to provide a methodology for statistical and analytical comparisons between the different cities in the world.
The difference between "administrative" and "real" cities is inversely proportional to their size, as will be demonstrated in the following paragraphs. This law has a global validity, despite some differences between continents.

Methodology
The proposed method adopts local subdivisions to delimit urban areas as wider urban zones, instead of administrative division. In this way, the statistical bias of the MAUP is kept but no new perimeters are identified, preserving the connection with the local geostatistical history. Urban areas, defined by a double threshold (territorial density and average brightness from satellite nighttime images) are treated as topological spaces.
In this sense, an urban area is an out-and-out topological space formed by the aggregation of several local administrative units. This allows transposing the intuitive concepts of "distance" and "continuity" from the frontier into the study of urban areas.

From the agglomeration to the connected domain of LAUs
The proposed method operates on the minimum territorial divisions, that is Local administrative units (LAUs); it performs the discretization of urban areas through the identification of homogeneous areas by territorial density. These elements are urban islands characterized by a good degree of urbanistic homogeneity. Each island is a geographical space with the following characteristics: homogeneous urban structure, related to the homogeneity of territorial distribution; possibility of being topologically represented by a simply connected domain (that is, a density-connected domain).
Starting from the main LAU-the pivot-, urban zones can be delimited in two ways: 1. by aggregating contiguous LAUs; 2. by disaggregating contiguous LAUs that do not meet aggregation conditions, and then by aggregating only the urban portions of the LAU to the pivot.

The density threshold
The first step is the definition of a density threshold to discriminate whether an LAU may or may not be part of the examined urban zone. The density threshold is not a standard assumption: it depends on the density of the upper administrative level (national/regional). So, the urban threshold of a region with a low density will be lower than for a region characterized by a higher density. Of course, this is a necessarily simplified procedure, as this numerical threshold approximates a much more complex real situation. Urban structures are interrelated by multiple elements: population density (conceived as an approximation of the contiguity of the built-up area) is just one of them. The key issue is the motorization rate of the population: the higher it is, the more land-rarefied (sprawling) urban structures will be. A city does not exist as a fixed entity (built-up area) but as a cloud of high-density relational paths.
In this sense, the subordination of one municipality to another is also described by the ratio between the mean number of daily trips between the two areas, divided by the total value. There are several values in the technical literature, but gener-  ally, a level of 30% is assumed to define urban contiguity and a level of 10% for metropolitan-level contiguity. In general, the existence of both spatial and functional contiguity (percentage of trips) is well approximated when fulfilling a set density threshold density. This approximation is acceptable (i.e., trips are underestimated by about 5-7% and not overestimated) for transport simulation. Given the density d [people/km 2 ] of the third-level administrative subdivision (NUT3 or NUT2 [Eurostat 2021], as in European NUTS 1 ), of the examined urban zone, the density threshold Θ is defined as (Fig. 1): The analysis of the different local conditions has led to the choice of a dynamic, rather than static, density threshold. As shown in Rüdiger and Neumann, changes in the threshold lead to changes in the granularity of the analysis. (Table 1).

Data sources
The data used to apply the described procedure have been obtained from the following sources: World national and administrative boundaries, FAO (2020) VIIRS Nighttime Light (VNL) satellite images (2020) Gridded Population of the World (GPW), version 4 (2019) Fig. 2 a The example represents a simplified administrative district with four distinct urban zones. For the first area, a.1, the WUZ is derived from the aggregation of three LAUs with different densities. In the area a.2, the WUZ is formed through the incorporation of two municipalities by density and annexation of a third one (with a density below Θ) by contiguity. a.3 and a.4 are mononuclear urbanisms: that is, single cities with a single municipality. b The second example shows how an urban zone can be formed even by disaggregation of a demographically inhomogeneous LAU boundary. A common non-simply connected boundary is divided into two sub-domains. One of them hosts the main urban nucleus, and the other one represents the remaining non-urban land. In b.4 the WUZ coincides only with the urban sub-domain. In the case of additive procedure: the core municipality (pivot)-considered a topographic space-is aggregated to the municipality in the concave space. The density of the resulting WUZ is lower than the original ones

WUZ building procedure
The method is iterative. Let the m zones of a given country with a third-level administrative subdivision NUT3 (or NUT2) be called regions Ri. Regardless of the local denomination, the model recognizes for each country, the subdivision into regions and the division of each region into local administrative units. Each region Ri contains n LAUs ordered from 1 to n in descending order based on the resident population. The procedure is iterative and is repeated for each LAU0 of Ri with a population greater than 50,000 inhabitants. The method is aimed at estimating the effective size of a city as the aggregation of several LAUs. Of course, if the urban zone is all contained within the LAU boundary, the method will not provide any aggregation.
The proposed procedure is an evolution of the pure topological model based on the density threshold-proposed in (Spinosa 2017)-integrated with the acquisition of average brightness within the perimeter of LAUs. For each region Ri and for each LAUk with more than 50,000 inhabitants and with a density d, the following additive algorithm (Fig. 2a) is performed to build the wider urban zone WUZj: 1. Acquire the administrative boundary of LAUk 2. Acquire the average brightness Bk calculated within the boundary of LAUk 3. Calculation of the threshold density Θ as in (1) 4. n = total number of LAUs in the region Ri 5. v = 1 K 6. w = 1 7. WUZ j D LAU k 8. For v = 1 to n repeat the following cycle: 9. m = number of LAUs adjacent to LAUk 10. For w = 1 to m repeat the following cycle: 11. Acquire the administrative boundary of LAUw 12. Acquire the average brightness Bw calculated within the boundary of LAUw 13. Acquire the density dw of LAUw 14. If B w 0,5B 0 then 15. If OEd w ‚ and LAU w 6 2 .WUZ k ; k D 1toj / then WUZ j D WUZ j C LAU w 16. w = w + 1 17. v = v + 1 18. End.
The WUZ building process by disaggregation (Fig. 2b) consists in repeating the aggregation process on a single LAU scale. The reference unit is the minimum investigation radius %R. Given a LAU, if R0 is the administrative division of higher level (NUT3 or NUT2), made up of n LAUh with a surface σh [km 2 ], then %R is: Given a LAUh boundary (with a population Πh and a surface σh [km 2 ]) adjacent to a forming UAo, LAUh is considered anomalous if: In this case, LAUh is divided into η regular square portions LAU k h [k = 1,η] with a side %R [km] and an area % R 2 [km 2 ]. At this point, the procedure continues as in the additive case. Each of the η LAU k h parts has: Of course, h D˜ ı k h . The aggregation gradually incorporates the individual LAU k h parts that meet the density threshold: LAUh is demographically inhomogeneous if not all the η parts LAU k h are incorporated in the formed UAo. In algorithmic form, the disaggregation procedure for LAUh is as follows: 1. Calculate %R as in (2) 2. k = 1 3.˜D h =4% R 2 4. If˜ 4 then 5. Divide LAUh into η equal parts 6. For k from 1 to η then 7. if ı k h ‚ then LAU 0 D LAU 0 C LAU k h 8. k = k + 1 9. End. The last step is to make the LAU0 a connected topological domain. LAU0 could have holes or points or, again, concave areas. Given the n administrative units of R0 not included in LAU0 (internal, as holes, or external): 1. U D LAU 0 2. i = 1 3. If U is a connected topological space or i = n 4. Then go to end 5. else while U is not a connected topological space This sub-procedure is a morphological homogenization process, which can include the addition of other LAUs or previously discarded (due to the low density) parts of LAU.

Conflict points
The WUZ building procedure has two possible ordinary conflict points, which occur when the aggregative process: a) meets the boundary of the region Ri b) meets a LAU that has already been incorporated into another WUZ Conflict (a) is resolved by continuing the procedure in the adjacent region. This conflict also results to be avoided by applying the procedure simultaneously to the entire State, without operating region by region. However, there will still be anomalies in cross-border WUZs between two neighboring states; the best solution is to proceed on a case-by-case basis by incorporating the foreign region into the WUZ building procedure.
Conflict (b) is resolved in two steps. The procedure is run on the LAUs ordered from the largest to the smallest population. Given a WUZj, the algorithm positively verifies the aggregation of LAUw, but LAUw is part of a previous formed WUZk. Then the following sub-procedure is: 1. If the density .WUZ k / ‚ then 2. If the population of the pivot city (WUZj) > 1.5 · population of pivot city (WUZk) then 3. WUZ j D WUZ j C WUZ k 4. Else if the population of pivot city (WUZk) > 1.5 · population of pivot city (WUZj) then WUZ k D WUZ k C WUZ j 5. End. Therefore, the adjacency of two WUZs implies that the largest one incorporates the other if two conditions are met: that is, the fulfillment of the density threshold and a ratio between the population of the respective pivot cities > 1.5. Conversely, the two areas, although adjacent, remain separate.

Results
This repository contains the tables-available for consultation-that result from the application of the proposed method for all the countries of the world. Considering the homogeneity of data availability, the year 2015 has been chosen as the demographic horizon. The tables, arranged by continent and macro-region, show the following data: name of the WUZ, region (NUT3 or NUT2), area (in km 2 ), population as of 1/1/2005 and as of 1/1/2015, annual growth rate (AGR), territorial density (pop. per km 2 ).
The following images show the comparison between the satellite images and the aggregated perimeter of WUZs with a population exceeding 200,000 inhabitants. The colors-increasing from yellow to red-indicate territorial density. (Figs. 3, 4 and 5).

Discussion
The results of the proposed procedure for the delimitation of urban zones with an additive method based on brightness and density thresholds strongly differ from those obtained with commuter flow-based methods. This matches the purpose of this work, which is to define a heuristic for the identification of Local administrative units in relation to the built environment. Fig. 6 shows the results of the proposed model for the Italian case. This method overcomes the issues related to the availability of commuting matrices; moreover, the methods that adopt aggregation thresholds linked to dynamic factors are more suitable for simulating the metropolitan area, that is, the space of a city.
The use of LAUs produces a less pleonastic perimeter. On average, in the Italian case, the perimeters produced by the proposed method are characterized by a built/ green area ratio of 1: 0.75/0.90. According to the "Local labor system" (LLS) of Istat the ratio is 1:0.7-8.5 (mean 1.87). The mean Jaccard distance dJ between the WUZs and the LLSs is 0.715. Despite the irregularity of the progression, Fig. 7 shows that the differences between the two methodologies tend to decrease as territorial    Fig. 7 Italy, the relation between WUZ density and Jaccard distance relative to the two distributions of the previous figure density grows. This confirms the confidence level of satellite image brightness as an alternative to commuting intensity, which is at the base of the Istat functionalist model. The use of local territorial distribution does not solve the MAUP bias, as the small case of the Italian case already shows: the average size of the LAU (Comune) varies between 15.8 km 2 in Lombardy and 76.1 km 2 in Apulia. However, this allows working with the calculated urban zones, using the existing databases, which must not be divided and projected on a different territorial division.
The application on the Italian case shows that the aggregated zones generated by the proposed model are more extended than the real urban area, and less extended than the metropolitan area. The comparison with the French "Unité urbaine" (I.n.d.l.s.e.d.é. économiques 2022), based on a formalist approach (density thresholds and number of workers) reported a value of dJ = 0.455. Instead, the comparison with the perimeters of the Travel to Work Areas (TTWAs) of the United Kingdom (O.f.N. Statistics 2016), which are determined through a functionalist approach, showed dJ = 0.765. In general, the estimations produced by the WUZ method seem to be 10-50% lower than those obtained with a formalist method. When comparing the results with the areas obtained through a functionalist approach, the (negative) difference is higher (beyond 50%).
Future research will include an in-depth comparison with the models based on the (fixed or dynamic) commuting threshold instead of the density threshold. It seems necessary to investigate the propensity to commuting within the WUZs to examine the circadian physiology of inner versus outer LAUs. In particular, the observations on the Italian case have suggested future research on the variation of brightness in relation to the number of commuting movements between a pivot city and the aggregated LAUs. This will contribute to the objective of standardizing the interpretation model of urban areas on a general and extra-national base.

Supplementary Information
The online version of this article (https://doi.org/10.1007/s10037-022-00169-y) contains supplementary material, which is available to authorized users.
Funding Open access funding provided by Università degli Studi di Roma La Sapienza within the CRUI-CARE Agreement.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4. 0/.

Conflict of interest
A. Spinosa declares that he has no competing interests. K