Keywords

1 Research on the Spatial Organization of the Settlement System and Useable Data Sources

This study is based on assumption that the settlement system represents a large set of complex processes between particular components of the society and the landscape variable in time and space. This process results in socio-spatial differentiation, which manifests itself the most as a spatial concentration of activities within society.

The spatial concentration of activities is a natural process of development of social systems. A certain form of concentration is necessary, as it is not possible to ensure the availability of all activities, which have different degrees of rarity, in all locations equally. This is the very essence of the formation of settlement systems, which the concentration of activities allows to arise.

The naturality of these differentiation processes and a certain tendency of social processes to create spatial differences have historically led to significant interest in the study of these phenomena. Some of such works have become key studies establishing individual paradigmatic schools of thought not only of the entire social and regional geography but also e.g. of regional economics and other social sciences.

The fundamental studies of the regularities of the spatial arrangement of social activities and their reflection in the settlement structure are based on principles described in localization theories [26,27,28,29] and much later in theories of the new economic geography (NEG) which brings a more realistic view of the conditionality of population distribution and economic activities in space due to taking into account a number of additional influencing factors (see e.g. [30, 31], or [32]). Spatial differentiation is also the essence of polarization theories [33, 34], which, like NEG or structuralist, institutional and other theories of regional development generally defined at the global level, are also applicable to socio-spatial processes at various size levels, including microregional (more detailed information in e.g. [35, 36]).

However, with the increasing concentration/differentiation, there is also a growth in the integration of spatial units which creates complex systems (regions) including the core and the periphery (see e.g. [33] or [34]), These processes are applied at hierarchically different size levels (see e.g. [5,6,7], or [15]). Hence, the settlement system and its complexity have both horizontal and hierarchical (vertical) dimensions. Therefore, it's a complex socio-spatial process, including a whole scope of interactions, the result of which is a complex and hierarchically differentiated system of mutually overlapping ties holistically covering the entire spectrum of human activities in space, acting differently at various hierarchical size levels.

The relationships between the individual elements of the settlement system have a hierarchical character with diffusive processes occurring between them, that ensure the spread of development potential, trends, and innovations (see e.g. [37]). Spatial diffusion is one of the main differentiating processes forming the settlement system, although the nature of its action varies depending on the hierarchical size level and depending on the essence of the given phenomena. As a result of the hierarchization process, in the settlement (or in the economic) system, new bearers of differentiation appear, but in the case of developmentally lower phenomena, due to diffusion processes, there is a nivelisation in interregional differences [6, 38].

In perspective of the organization of the settlement system, these processes create specific regional structures (regions) at each hierarchical level (macroregional, mezoregional, microregional or sub-microregional). These regions are internally strongly integrated by processes of a certain type (depending on the hierarchical size level), and at the same time, they are relatively relationally closed to the outside.

The study of spatial interactions of the settlement system in order to understand socio-spatial region-forming processes is based on the principles of gravity theories generally assuming a decreasing influence of the center on its surroundings with increasing distance, while also depending on the size/importance of the given center and the activities concentrated in it. Every element of the settlement system (or place of economic activity) interacts with its surroundings and represents both a generating role (supply) and an attracting role (demand). Gravitational models of spatial interactions built on these models differ mainly in the variety of parameters. Which take into account (see e.g. [17, 39,40,41,42,43,44]).

The interconnectedness of individual processes within the settlement system is so complex that it cannot be easily identified or even measured in any way. However, the external manifestations of these processes are measurable. These take the form of spatial interactions and manifest themselves as commuting relationships different at various hierarchical levels (see e.g. [5,6,7,8,9,10,11,12, 14, 16, 45, 46], or [37]). They are thus realized through transport links which have been measured for long period by transport geographers (see e.g. [1, 3, 4, 11, 13, 15, 47], or [22]). These interactions are not only quantitatively different but at the same time qualitatively different at various hierarchical levels of size-order. In general, the described spatial interactions can be called population mobility. It contains not only the actual journeys but also a reflection of the overall spatial pattern of each individual's behavior. Hence, the mobility/commuting behavior of an individual takes into account the intensity, frequency and repeatability of certain elements of spatial behavior [48, 49], This also determines the hierarchical position of the commuting destination and its relationship to the place of origin. In conclusion, the spatial behavior of the population completely reflects the relationships and processes within the settlement system and therefore, it is a suitable object of measurement for their explication.

A wide range of tools can be used in both local and large-scale statistical surveys focused on the traffic behavior/mobility of residents, such as questionnaire surveys, traffic diaries, GPS loggers, measuring passengers transported by individual modes of transport, or measuring traffic intensity (see e.g. [4, 11, 15, 17, 22], or [23]). In the Czechia, queries about commuting to work and schools are even part of the census, however, these available statistics have a low return in recent censuses, and it is assumed that up to 40% of commuting flows are missing from the census statistics. With this in mind, a significant potential for mobility measurement can be seen in the use of the geolocation data of mobile operators. Due to the high penetration of the population by mobile devices, and the possibility of tracking movement in unlimited random periods, this approach combines both the advantages of population-wide data collection and detailed (movement tracking) studies as well (see e.g. [21, 50, 51] or [47]).

The essence of the method are the records in the geolocation network, which are created every few minutes by every device joined to the GSM network via SIM cards. Determining the location is approximate by this technique, as only the transmitter (BTS) that registered the recording is precisely located. From the signal coverage map of individual transmitters, the approximate location of the SIM card can be deduced with an accuracy of hundreds of meters in urbanized areas and up to a few kilometers in rural areas.

In order to obtain this type of data, it is necessary to set up a complex mechanism of tools analyzing more than 10 million SIM cards (the case of the Czechia), each of which produces thousands of records within the measured periods. In addition to the technical solution and considerable computing capacity required for Big Data processing itself. Besides, it is also essential to consistently establish methodological procedures for the preliminary processing of primary records for the creation of databases of citizens’ mobility/travel behavior.

In the past, it was the method of data claiming that was the main obstacle in the use of geolocation data concerning their low validity (me more detail e.g. [19, 20], or [21]). Research carried out in the past in the field of data analysis of mobile operators had to solve problems of representativeness of data and their evidential value when generalizing to the population (see e.g., [23], or [2]). Although this shortcoming is not an obstacle for use in research from the technical fields aimed at measuring the volume of journeys made or data transmitted, in the field of social geography the question of the generalizability of data to the population and the projection of spatial patterns of behaviors onto entire society in space and time is absolutely essential. For this purpose, a unique model was created, including a complete range of interconnected processes, which captures the mobility of the population and projects it on the social and settlement networks.

2 Model of Data Acquisition Process

The whole model (see Fig. 1) is based on the presumption that mobile phones move together with their users for most of the day (mentioned in e.g., [20], or [21]). Based on this assumption, the model eliminates records created by other devices than mobile phones, thereby largely eliminating the problem of duplicate records of a single user of multiple devices. Similarly, rarely used SIM cards that do not make enough records in the network are neglected. Furthermore, the assumption of high penetration of the population by mobile phones is also crucial. In general, it can be concluded that in contemporary societies of developed countries, both assumptions are fulfilled. The reduction of the base dataset of records in the network ensured that approximately 10.3 million SIM cards were included in the overall analysis in each of the 4 realized measurements. We distinguish two elemental states that SIM cards can acquire: stay or movement. As part of the records in the geolocation network, we only have information about the “stay” and the “movement” is detected as inferred, based on a change of stay. To demonstrate, if the SIM card (as part of a periodic update) logs in twice consecutively to different BTS transmitters, the location of stay changes, and it can be inferred that the SIM card has moved in the meantime.

When detecting the movement or stay of the SIM card, the proposed model must solve the problem of the inconsistency of the administrative boundaries of the municipalities with the boundaries of the signal transmitter´s service area. In reality, it is common for one BTS transmitter to serve several municipalities or their parts at the same time. In addition, overlapping of service areas of different transmitters is common as well. Based on the signal coverage maps of individual BTS transmitters, a “cell network” is created. The network of cells is continuous, without residues and overlaps covering the whole territory of Czechia. A cell represents an area in which a located SIM card will be served (with the highest probability) by one particular transmitter. There are usually 3 antennas on each BTS transmitter oriented in different directions (at 120° angles), and each of them has its own cell. The territorial detail of this network is therefore very high.

The detection of stay/movement itself and its assignment to particular territorial units (municipalities) occurs via the “cell-mapping process”. In fact, it is an advanced algorithm designed just for the needs of this purpose. This tool distributes the measured records between specific settlements (municipalities) according to the amount of intravillan (build-up area) of each settlement extending into a specific cell (service area of the signal transmitter) and also reflecting the population density of each settlement. It is a complex mathematical computation that compares all defined cells with the land-use coverage map, especially the built-up areas of municipalities, of which there are more than 6 thousand in Czechia. In this case, any building or another way urbanized area is considered to be a built-up area, except traffic roads and railways. If several built-up areas extend into one cell, the mathematical algorithm evaluates the size of the built-up area of each municipality extending into the given cell. In this approach, the population density of each municipality is also included in the computation (obtained from official state statistics). Taking population density into account is an essential element for making the modeling more accurate, as a housing estate can interfere with a cell within one municipality and an industrial complex from another municipality, which, although they may occupy the same area, can be expected to have different occupancy by residents. Individual SIM cards with defined “stay” in a given cell are subsequently distributed among individual municipalities completely automatically based on this computation.

The accuracy of the distribution of SIM cards to the territory (cell-mapping. Process) mainly depends on the quality of the map showing the signal coverage network, i.e. service cells. In the case of our model, the situation is even more complicated, as the data is obtained from all 3 mobile network providers in Czechia, while each uses its own network of BTS transmitters, and therefore has its own signal coverage maps. In addition, there are several layers of these cell networks for each provider depending on the type of used technology (3G, 4G, 5G). The inputs to this calculation are varied, and the computing algorithm of the “cell-mapping process” repeatedly recalculates the entire task in case of any change in the inputs, e.g. due to technical repairs to part of the BTS network. In our specific case, the inputs changed not only between individual measured periods, but also between individual days of each measurement.

In conjunction with the clustering algorithm, cell mapping process can eliminate the unwanted effects of so-called “cell jitter” This represents random switching between neighboring transmitters serving the same location. It occurs especially when one transmitter is overloaded, or i.e., if weather conditions cause a change in the signal strength of individual neighboring transmitters in a given location (mostly locations near cell borders). This detects a change of SIM cards position and therefore “movement” in the resulting dataset, while the SIM card does not actually move at all. (for more details see [19], or [20]).

The clustering algorithm is an extension to the cell-mapping process. Monitors and recalculates individual short-term switching between neighboring cells (transmitters). Only those cases where the SIM card is logged in repeatedly in another cell, and the stay lasts at least 30 min are considered a change of stay (e.i. movement). It considers all other logins to other transmitters within its cluster of neighboring cells as a continuation of the current stay.

Databases obtained through this model also removes other undesirable elements that worsen the evidential value of the data, such as the share of virtual operators using the network, ownership of multiple SIM cards by one user, or on the contrary, not owning a mobile phone and thus no SIM cards usage. These effects cannot be technically (physically) eliminated or excluded from the dataset in any way, as they are a natural part (feature) of these data. For that reason, mathematical adjustments were made to compensate for these negative effects. As a result of the application of these compensations (application of coefficients unique to each municipality), a complex model is created, which no longer represents the number of measured SIM cards but the expected number of moving people. These compensations were based on a questionnaire survey on ownership/non-ownership of SIM cards (more than 8,000 respondents) and at the same time, on the measurement of territorial differences in shares of virtual operators in the signaling network. All described aspects are taken into account by the model of data acquisition, the process of which is shown in Fig. 1.

In addition, the model also contains relocation mechanisms that are capable to correct retroactively model errors in assigning the records to individual territorial units (e.i. cell-mapping process). The relocation process consists of several steps (automated and manual). Municipalities, where the cell mapping process may have failed during the detection of stay, are identified in this step. Thanks to the relocation mechanism, which contains elements of machine learning, the entire data acquisition model can adjust its settings even between different time measurements.

The relocation mechanism responds to the fact that the main problem with the entire data acquisition model is the impossibility of reliably verifying its validity, as there is no reference data to compare. With this in mind, there is the possibility of comparing only one parameter, namely the expected number of residents (defined in more detail later in the text) and the number of municipality inhabitants according to official state statistics. Although each of these data measures a different indicator, they both monitor the same phenomenon, i.e. the number of people living in the locality. Both data should not be completely identical, but at the same time, they cannot be diametrically different. Altogether, in more than 90–95% of all municipalities, the differences compared to official statistics on the number of inhabitants were minimal. In cases where the differences are significant (greater than ± 25%), a relocation mechanism was applied. In its automated part, it is searching for cases where SIM cards may have been incorrectly distributed (from cells) to the territories of municipalities. In practice, these errors were manifested mainly by neighboring extremely undervalued and extremely overvalued municipalities. In such cases, the mechanism proposed to move individual measured SIM cards. However, after the application of these transfers, the affiliation of some SIM cards to municipalities will change and, for that reason, different coefficients should subsequently be implemented on them within the data acquisition model (described in Fig. 1 – left part), therefore the individual steps are repeated again (from the cell-mapping process). Automatic relocation can ensure that municipalities that show extreme differences in the number of residents and the official population will be left with 1–2%. In these cases, it is subsequently necessary to individually assess whether these are local anomalies caused by the model setting or a condition that is justified in geographical reality. For example, smaller municipalities, which are, however, important tourist centers or spas, showed in individual measurements up to several times higher number of residents than their official population. If these data are justified in this way, these anomalies represent important information that, on the contrary, must be preserved in the data.

Fig. 1.
figure 1

Model of the data acquisition process and the mechanism of the labeling process

In addition to the mentioned data operations, in the final phase, a projection on the population is made, which ensures the representativeness of the data model according to the population. Hence the entire model is calibrated to the number of the official population of the state. In addition, the calibration was carried out to an absolutely minimal extent. The data measured by the model differed from the offical population of the state by only a few percentage units.

At the same time, sufficient anonymization of the final data is ensured. Consistent multi-phase anonymization ensures the impossibility of identifying specific persons with the specific data in the datasets. As a result of this process, some data is also artificially slightly deliberately altered for the purpose of ensuring the impossibility of potentially associating with specific persons. In fact, it potentially only applies in the case of very small municipalities with a small number of residents and a small volume of interactions.

This model is flexible in terms of the output databases produced. According to the primary setting, it produces a total of 15 attributes on the territorial detail of individual municipalities, structured into 3 basic interconnected datasets: a) statistical data for individual municipalities and characteristics of their residents, b) OD matrix showing commuting directions taking into account a total of 6 types of commuting intensity/types, and c) the average number of currently present population in every hour of the week (24/7) in each municipality with a breakdown by particular attributes.

The method of assigning attributes/labels to individual users in the network is also unique.it occurs within the labeling process as one of the steps of the model of the data acquisition (Fig. 1) Basically, the method does not monitor the actual volumes of the trips made but analyses the commuting rhythms and the overall spatial commuting behavior of each SIM card user during each of a total of 4 measured period (each 28 days). The list of individual labels and the process of their assignment is indicated in Fig. 1 (right part). Specifically, these are resident attribute (R1) i.e., the place where people most often spend the night, further there are 3 types of commuting - daily, weekly, occasional (C1-C3), overnight visitor (OV), one-time visitor (V), second residence (R2), and not classified stay. During the monitored period (28 days), “labels” of attributes are assigned to each SIM card according to its unique pattern of spatial behavior. The assignment of labels is carried out based on tracking the visited locations of individuals and analyzing the frequency of visits, their number, the total time spent at the destination, the periodicity/repeatability of these movements, and even the time of day when the movement/stay was realized. Each individual (SIM card user) can only have one label for each municipality, but he can have several labels for several municipalities - can be a resident in one municipality, commute to another for work or school, commute for services, or be an occasional commuter to another etc. The entire model of the data acquisition and label assignment is underway on two dimensions. One is general, and the other is limited to working days during working hours (Mon-Fri, 6 am–6 pm). These labels are assigned only to those users who fulfill specific characteristics of the given type of relationship in the specified time period (WH - working hours). Thus, two sets of labels are assigned independently, and the entire assignment algorithm s underway twice.

As a result of the labeling process the output databases themselves do not indicate specific measured values for a certain day or period, but each attribute represents basically the number of people who reports a given type of behavior. This is no longer the geolocation data itself, but a summary of time-spatially aggregated statistics about geolocation data.

3 Usability of the Resulting Databases

The resulting databases are a unique source of information on the residence and movement of persons in the long term. The data produced by this model are currently available from a total of 4 measurements from 2021 - 2023 and cover the specifics of mobility patterns in all individual seasons. It thus represents a fully comprehensive tool for creating a mobility model of the inhabitants and evaluating their travel behavior patterns.

It turns out that the stays and movements detected by this method are comparable in their accuracy to the reference data of national statistics. In addition, thanks to its volume, detail, and permanent monitoring, it can also identify the nature of relationships that cannot be identified by conventional approaches.

The identified territorial relations very faithfully describe the specifics of the spatial organization of the settlement system, its hierarchical levels, and mutual interactions. It thus expresses the territorial structure of social differentiation.

One of the most significant manifestations of the territorial dimension of the residential structure is the daily commute to work, to schools, or other activities that take up the main part of the day. These processes can be specifically captured in the data in two ways. 1) using data from the OD matrix showing links of inter-municipal commuting. These links can be divided according to the types of commuting relationships, which depend on the definition of the individual distributed labels. These data are the foundation for the creation of functional microregions based on natural interactions. An example of such a mapping of commuting relations can be seen in Fig. 4, which will be explained further in the text. 2) Use of the data from the dataset on the number of people present in the municipality in individual hours. From these data, a municipal occupancy model can be created, which is presented in Fig. 2 on the example of the Czech capital Prague and its wider hinterland (Central Bohemia region).

Fig. 2.
figure 2

Mobility model for the Prague and the Central Bohemia region. Current attendance of individuals in the municipalities during day according to the average occupancy of each municipality.

Mobility between the hinterland and a strong core such as Prague is very evident in the daily commute. For each municipality, the model points to its basic commuting parameter, namely the prevailing incoming ratio or outgoing ratio. The mobility model here shows the relative values of the incoming/outgoing commuting ratio during the day. For that reason, changes in the number of people present in the village during the day are more pronounced in small villages, where even a small number of moving people causes a significant change. On the other hand, in large cities/towns, especially Prague, even the enormous increase in the number of individuals during the day (up to 200,000 commuters) means relatively little change. In addition to Prague, other large cities in the region with a population of around 20–60 thousand also appear as important commuting centers. Incoming commuting settlements are also concentrated in the immediate vicinity of Prague, where important employers are located. Smaller municipalities at a greater distance then mainly fulfill the function of residential settlements with prevailing outgoing over incoming commuting. This regional mobility model shows the basic elements of the spatial interactions of the core and peripheral areas of the micro-regional level of the settlement system, which are precisely defined by the daily commute to work and school.

In more detail, we can follow the daily rhythm of each of the 6,254 municipalities in the Czech Republic. In addition to the number of people present at every hour of the day on all days of the week (24/7), we can also get a deeper breakdown into particular types of individuals from the model. Certain types of individuals are also dependent on the assigned attributes and therefore represent the relationship of each present person to the given place. The different daily or weekly rhythms of particular municipalities can be seen in Fig. 3, where an example of 4 municipalities of different sizes and different regional specifics is given.

Fig. 3.
figure 3

Currently present individuals in the municipality (every hour of the week – 24/7). Divided into categories (sorted from the bottom): resident (R1), commuters I., II., and III. Type (C1-C3), overnight visitor (OV), visitor (V), transiting and other (not classified stay). The y-axis scale is adjusted individually for each case to provide a relative comparison.

The capital city of Prague has a completely different and relatively complementary weekly rhythm from Špindlerův Mlýn, which is a well-known ski resort. On the other hand, the daily rhythm on weekdays shows the complementarity of Prague and the small village of Křivoklát (Central Bohemia region), which is known as a one-day trip tourist center (medieval castle on its territory). A very specific case is the very small village of Ovčáry (Central Bohemia region), on whose territory a huge automotive factory is located. Its daily rhythm shows very clearly the specific hours when work shifts change in this factory as an extreme increase in the number of people present with regular periodicity.

From another point of view, the structure of the present population shows that, in the case of Prague, residents dominate and the daily rhythm is defined by 1st type of commuters (daily commute). Conversely, in Špindlerův Mlýn, local residents are the minority in the village throughout the week, and oversleep visitors dominate. In Křivoklát, during working days the regime of the village depends on the behavior of residents, however, on the weekend the mode is influenced by 2nd type commuters, one-time visitors, and people without an assigned label (random visitors or transiting persons). In the case of Ovčáry, the main component of the present population is daily commuters to the industrial area.

Such data are very valuable, especially in the agenda of spatial planning, development of the technical infrastructure, or in crisis management. They provide a piece of real, accurate, and very detailed information about the mobility needs of the residents of all municipalities of the state.

Processed databases enable visualization using GIS web tools, and make the data freely accessible. These advanced cartographic tools visualize data on the direction and strength of individual commuting relationships and supplement them with other interactive graphic elements (charts, tables, etc.).

Fig. 4.
figure 4

A web map application showing the daily inter-municipal commuting links. Based on the OD matrix of the number of people commuting between municipalities, with the possibility of breakdown to particular types of commuting according to the assigned labels.

In the given example of the output from this web map application (see Fig. 4), 2 cases are displayed. 1) Case of all recorded commuting links (inter-municipal), of which there are more than 41 thousand in the database. And 2) case of primary commuting links for each municipality. Especially from the example of the map of primary commuting links (on the right), it is clear high internal integrity within the regions (clusters of individual links around important centers) and external relative closeness (disconnection between clusters by primary links). This map application will be available from this link after the approval of all project outputs.

These data enable the creation of further, advanced spatial analyses, for example, to define regions of territorial concentration and, based on them, to create the overall socio-economic regionalization of the state (see e.g., [2, 18, 21], or [25]). The freely accessible web application allows the general public to display this data, use, analyze, or download it and also allows various forms of sharing it. It is also possible that new web-based map applications using the primary dataset as a web map service can be created based on this data.

4 Resume

This article shows the usability of geolocation data of mobile operators and a new model for their acquisition for a purpose of population mapping and identification of patterns of their traffic behavior. It turns out that the output databases created by applying the described data acquisition model enable subsequent applications of geographic analyses identifying functionally integrated regions and their central areas at different hierarchical levels. Based on the principle of commuting to certain centers, the intensity and volume of these interactions, relatively closed (in terms of functional closeness of the interactions) and internally integrated regions might be recognized.

Primarily, the method is set to identify functional micro-regional commuting links. Microregions are territories in which a resident should be able to secure all his daily activities necessary and important for his everyday life. Their centers are primary commuting destinations for their surroundings and provide a sufficient range of job opportunities, primary and secondary education, health services, shops, etc. Visiting centers of a higher hierarchical levels providing services of a higher grade, however, is not needed daily. Nevertheless, thanks to the robustness of the analyzed data, it is also possible to define centers of higher (mezoregional) or lower (submicroregional) levels. Therefore, this method enables the implementation of a complete socio-economic regionalization of the state at individual hierarchical size-levels, including their hierarchical relationships. Socio-economic regionalization using data from this model represents natural commuting regions based on the assessment of the natural concentration of people in the territory based on their long-term monitoring. The concentration processes identified in the data reflect both commuting flows of varying intensity, i.e., the movement of individuals, as well as the concentration of people in a specific place during individual hours of each day of the week, i.e., the stay of individuals.

This approach is used not only for regionalization itself in the sense of academic research on the development of the settlement structure of the state, and its hierarchical organization but also for practical use, especially in the field of public and private service delivery. For individual municipalities, information on the stay and movement of people in the territory appears to be essential for the implementation of policies of regional development, urban planning, or crisis management in cases of solving risk treatments. Localization of public administration services should be implemented as part of the implementation of this project. However, the great potential of using this data is also for the localization of private services such as local shops, pharmacies, delivery service boxes, ATMs, and possibly even educational and medical facilities. All public infrastructure as well as the technical infrastructure of each municipality should thus respect the natural processes of concentration and rhythms of commuting behavior of citizens of surrounding areas. As a result, these steps indicate a significant increase in the standard of living of citizens and an improvement in public and private service delivery.

This approach was used in the Czech Republic for a comprehensive revision of the spatial units of the public administration structure. The purpose of this activity was to harmonize the administrative units with natural commuting regions. Particularly, the aim was to ensure that public administration offices were located where people naturally concentrated. This leads to streamlining and deconcentrating of the public administration and its adaptation to the needs of citizens. Based on this application example, it is also possible to conclude about the transferability of this approach and its applicability both in other territories (states) or in other scientific fields.